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Preface 


It was our privilege to serve as the program chairs for CAV 2018, the 30th International 
Conference on Computer-Aided Verification. CAV is an annual conference dedicated 
to the advancement of the theory and practice of computer-aided formal analysis 
methods for hardware and software systems. CAV 2018 was held in Oxford, UK, July 
14-17, 2018, with the tutorials day on July 13. 

This year, CAV was held as part of the Federated Logic Conference (FLoC) event 
and was collocated with many other conferences in logic. The primary focus of CAV is 
to spur advances in hardware and software verification while expanding to new 
domains such as learning, autonomous systems, and computer security. CAV is at the 
cutting edge of research in formal methods, as reflected in this year’s program. 

CAV 2018 covered a wide spectrum of subjects, from theoretical results to concrete 
applications, including papers on application of formal methods in large-scale industrial 
settings. It has always been one of the primary interests of CAV to include papers that 
describe practical verification tools and solutions and techniques that ensure a high 
practical appeal of the results. The proceedings of the conference are published in 
Springer’s Lecture Notes in Computer Science series. A selection of papers were 
invited to a special issue of Formal Methods in System Design and the Journal of the 
ACM. 

This is the first year that the CAV proceedings are published under an Open Access 
license, thus giving access to CAV proceedings to a broad audience. We hope that this 
decision will increase the scope of practical applications of formal methods and will 
attract even more interest from industry. 

CAV received a very high number of submissions this year—215 overall—tesulting 
in a highly competitive selection process. We accepted 13 tool papers and 52 regular 
papers, which amounts to an acceptance rate of roughly 30% (for both regular papers 
and tool papers). The high number of excellent submissions in combination with the 
scheduling constraints of FLoC forced us to reduce the length of the talks to 15 
minutes, giving equal exposure and weight to regular papers and tool papers. 

The accepted papers cover a wide range of topics and techniques, from algorithmic 
and logical foundations of verification to practical applications in distributed, net- 
worked, cyber-physical, and autonomous systems. Other notable topics are synthesis, 
learning, security, and concurrency in the context of formal methods. The proceedings 
are organized according to the sessions in the conference. 

The program featured two invited talks by Eran Yahav (Technion), on using deep 
learning for programming, and by Somesh Jha (University of Wisconsin Madison) on 
adversarial deep learning. The invited talks this year reflect the growing interest of the 
CAV community in deep learning and its connection to formal methods. The tutorial 
day of CAV featured two invited tutorials, by Shaz Qadeer on verification of con- 
current programs and by Matteo Maffei on static analysis of smart contracts. The 
subjects of the tutorials reflect the increasing volume of research on verification of 
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concurrent software and, as of recently, the question of correctness of smart contracts. 
As every year, one of the winners of the CAV award also contributed a presentation. 
The tutorial day featured a workshop in memoriam of Mike Gordon, titled “Three 
Research Vignettes in Memory of Mike Gordon,” organized by Tom Melham and 
jointly supported by CAV and ITP communities. 

Moreover, we continued the tradition of organizing a LogicLounge. Initiated by the 
late Helmut Veith at the Vienna Summer of Logic 2014, the LogicLounge is a series of 
discussions on computer science topics targeting a general audience and has become a 
regular highlight at CAV. This year’s LogicLounge took place at the Oxford Union and 
was on the topic of “Ethics and Morality of Robotics,” moderated by Judy Wajcman 
and featuring a panel of experts on the topic: Luciano Floridi, Ben Kuipers, Francesca 
Rossi, Matthias Scheutz, Sandra Wachter, and Jeannette Wing. We thank May Chan, 
Katherine Fletcher, and Marta Kwiatkowska for organizing this event, and the Vienna 
Center of Logic and Algorithms for their support. 

In addition, CAV attendees enjoyed a number of FLoC plenary talks and events 
targeting the broad FLoC community. 

In addition to the main conference, CAV hosted the Verification Mentoring 
Workshop for junior scientists entering the field and a high number of pre- and 
post-conference technical workshops: the Workshop on Formal Reasoning in Dis- 
tributed Algorithms (FRIDA), the workshop on Runtime Verification for Rigorous 
Systems Engineering (RV4RISE), the 5th Workshop on Horn Clauses for Verification 
and Synthesis (HCVS), the 7th Workshop on Synthesis (SYNT), the First International 
Workshop on Parallel Logical Reasoning (PLR), the 10th Working Conference on 
Verified Software: Theories, Tools and Experiments (VSTTE), the Workshop on 
Machine Learning for Programming (MLP), the 11th International Workshop on 
Numerical Software Verification (NSV), the Workshop on Verification of Engineered 
Molecular Devices and Programs (VEMDP), the Third Workshop on Fun With Formal 
Methods (FWFM), the Workshop on Robots, Morality, and Trust through the Verifi- 
cation Lens, and the IFAC Conference on Analysis and Design of Hybrid Systems 
(ADHS). 

The Program Committee (PC) for CAV consisted of 80 members; we kept the 
number large to ensure each PC member would have a reasonable number of papers to 
review and be able to provide thorough reviews. As the review process for CAV is 
double-blind, we kept the number of external reviewers to a minimum, to avoid 
accidental disclosures and conflicts of interest. Altogether, the reviewers drafted over 
860 reviews and made an enormous effort to ensure a high-quality program. Following 
the tradition of CAV in recent years, the artifact evaluation was mandatory for tool 
submissions and optional but encouraged for regular submissions. We used an Artifact 
Evaluation Committee of 25 members. Our goal for artifact evaluation was to provide 
friendly “beta-testing” to tool developers; we recognize that developing a stable tool on 
a cutting-edge research topic is certainly not easy and we hope the constructive 
comments provided by the Artifact Evaluation Committee (AEC) were of help to the 
developers. As a result of the evaluation, the AEC accepted 25 of 31 artifacts 
accompanying regular papers; moreover, all 13 accepted tool papers passed the eval- 
uation. We are grateful to the reviewers for their outstanding efforts in making sure 
each paper was fairly assessed. We would like to thank our artifact evaluation chair, 
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Igor Konnov, and the AEC for evaluating all artifacts submitted with tool papers as 
well as optional artifacts submitted with regular papers. 

Of course, without the tremendous effort put into the review process by our PC 
members this conference would not have been possible. We would like to thank the PC 
members for their effort and thorough reviews. 

We would like to thank the FLoC chairs, Moshe Vardi, Daniel Kroening, and Marta 
Kwiatkowska, for the support provided, Thanh Hai Tran for maintaining the CAV 
website, and the always helpful Steering Committee members Orna Grumberg, Aarti 
Gupta, Daniel Kroening, and Kenneth McMillan. Finally, we would like to thank the 
team at the University of Oxford, who took care of the administration and organization 
of FLoC, thus making our jobs as CAV chairs much easier. 
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Abstract. We present a graph-based tool for analysing Java programs 
operating on dynamic data structures. It involves the generation of 
an abstract state space employing a user-defined graph grammar. LTL 
model checking is then applied to this state space, supporting both 
structural and functional correctness properties. The analysis is fully 
automated, procedure-modular, and provides informative visual feedback 
including counterexamples in the case of property violations. 


1 Introduction 


Pointers constitute an essential concept in modern programming languages, and 
are used for implementing dynamic data structures like lists, trees etc. However, 
many software bugs can be traced back to the erroneous use of pointers by e.g. 
dereferencing null pointers or accidentally pointing to wrong parts of the heap. 
Due to the resulting unbounded state spaces, pointer errors are hard to detect. 
Automated tool support for validation of pointer programs that provides mean- 
ingful debugging information in case of violations is therefore highly desirable. 

ATTESTOR is a verification tool that attempts to achieve both of these goals. 
To this aim, it first constructs an abstract state space of the input program by 
means of symbolic execution. Each state depicts both links between heap objects 
and values of program variables using a graph representation. Abstraction is per- 
formed on state level by means of graph grammars. They specify the data struc- 
tures maintained by the program, and describe how to summarise substructures 
of the heap in order to obtain a finite representation. After labelling each state 
with propositions that provide information about structural properties such as 
reachability or heap shapes, the actual verification task is performed in a second 
step. To this aim, the abstract state space is checked against a user-defined LTL 
specification. In case of violations, a counterexample is provided. 


H. Arndt and C. Matheja—Supported by Deutsche Forschungsgemeinschaft (DFG) 
Grant No. 401/2-1. 
© The Author(s) 2018 
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In summary, ATTESTOR’s main features can be characterized as follows: 


— It employs context-free graph grammars as a formal underpinning for defining 
heap abstractions. These grammars enable local heap concretisation and thus 
naturally provide implicit abstract semantics. 

— The full instruction set of Java Bytecode is handled. Program actions that are 
outside the scope of our analysis, such as arithmetic operations or Boolean 
tests on payload data, are handled by (safe) over-approximation. 

— Specifications are given by linear-time temporal logic (LTL) formulae which 
support a rich set of program properties, ranging from memory safety over 
shape, reachability or balancedness to properties such as full traversal or 
preservation of the exact heap structure. 

— Except for expecting a graph grammar that specifies the data structures han- 
dled by a program, the analysis is fully automated. In particular, no program 
annotations are required. 

— Modular reasoning is supported in the form of contracts that summarise the 
effect of executing a (recursive) procedure. These contracts can be automat- 
ically derived or manually specified. 

— Valuable feedback is provided through a comprehensive report including (min- 
imal) non-spurious counterexamples in case of property violations. 

— The tool’s functionality is made accessible through the command line as well 
as a graphical user and an application programming interface. 


Availability. ATTESTOR’s source code, benchmarks, and documentation are avail- 
able online at https://moves-rwth.github.io/attestor. 


2 The Attestor Tool 


ATTESTOR is implemented in Java and consists of about 20.000 LOC (excluding 
comments and tests). An architectural overview is depicted in Fig. 1. It shows the 
tool inputs (left), its outputs (right), the ATTESTOR backend with its processing 
phases (middle), the ATTESTOR frontend (below) as well as the API connecting 
back- and frontend. These elements are discussed in detail below. 


2.1 Input 


As shown in Fig. 1 (left), a verification task is given by four inputs. First, the 
program to be analysed. Here, Java as well as Java Bytecode programs with 
possibly recursive procedures are supported, where the former is translated to 
the latter prior to the analysis. Second, the specification has to be given by a 
set of LTL formulae enriched with heap-specific propositions. See Sect.3 for a 
representative list of exemplary specifications. 

As a third input, ATTESTOR expects the declaration of the graph grammar 
that guides the abstraction. In order to obtain a finite abstract state space, 
this grammar is supposed to cover the data structures emerging during program 
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Fig. 1. The ATTESTOR tool 


execution. The user may choose from a set of grammar definitions for standard 
data structures such as singly- and doubly-linked lists and binary trees, the 
manual specification in a JSON-style graph format and combinations thereof. 

Fourth, additional options can be given that e.g. define the initial heap config- 
uration(s) (in JSON-style graph format), that control the granularity of abstrac- 
tion and the garbage collection behaviour, or that allow to re-use results of 
previous analyses in the form of procedure contracts [11,13]. 


2.2 Phases 


ATTESTOR proceeds in six main phases, see Fig. 1 (middle). In the first and third 
phase, all inputs are parsed and preprocessed. The input program is read and 
transformed to Bytecode (if necessary), the input graphs (initial configuration, 
procedure contracts, and graph grammar), LTL formulae and further options 
are read. 

Depending on the provided LTL formulae, additional markings are inserted 
into the initial heap (see [8] for details) in the second phase. They are used to 
track identities of objects during program execution, which is later required to 
validate visit and neighbourhood properties during the fifth phase. 

In the next phase the actual program analysis is conducted. To this aim, 
ATTESTOR first constructs the abstract state space as described in Sect. 2.3 in 
detail. In the fifth phase we check whether the provided LTL specification holds 
on the state space resulting from the preceding step. We use an off-the-shelf 
tableau-based LTL model checking algorithm [2]. 
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If desired, during all phases results are forwarded to the API to make them 
accessible to the frontend or the user directly. We address this output in Sect. 2.4. 


2.3 Abstract State Space Generation 


The core module of ATTESTOR is the abstract state space generation. It employs 
an abstraction approach based on hyperedge replacement grammars, whose the- 
oretical underpinnings are described in [9] in detail. It is centred around a graph- 
based representation of the heap that contains concrete parts side by side with 
placeholders representing a set of heap fragments of a certain shape. The state 
space generation loop as implemented in ATTESTOR is shown in Fig. 2. 
Initially it is provided with add 
the initial program state(s), = ica 
that is, the program counter 
corresponding to the starting 
statement together with the ini- 
tial heap configuration(s). From 
these, ATTESTOR picks a state 
at random and applies the 


pick state 
in state space 


Y 
concretise 


V 
for each resulting state 


abstract semantics of the next te 
statement: First, the heap con- rectify 
figuration is locally concretised TE 
ensuring that all heap parts — 


required for the statement to 
execute are accessible. This is 
enabled by applying rules of the addto |, 
A A state space 
input graph grammar in for- 

ward direction, which can entail 
branching in the state space. 
The resulting configurations are 
then manipulated according to 
the concrete semantics of the statement. At this stage, ATTESTOR automati- 
cally detects possible null pointer dereferencing operations as a byproduct of the 
state space generation. In a subsequent rectification step, the heap configuration 
is cleared from e.g. dead variables and garbage (if desired). Consequently, mem- 
ory leaks are detected immediately. The rectified configuration is then abstracted 
with respect to the data structures specified by means of the input graph gram- 
mar. Complementary to concretisation, this is realised by applying grammar 
rules in backward direction, which involves a check for embeddings of right- 
hand sides. A particular strength of our approach is its robustness against local 
violations of data structures, as it simply leaves the corresponding heap parts 
concrete. Finalising the abstract execution step, the resulting state is labelled 
with the atomic propositions it satisfies. This check is efficiently implemented by 
means of heap automata (see [12,15] for details). By performing a subsumption 
check on the state level, ATTESTOR detects whether the newly generated state 
is already covered by a more abstract one that has been visited before. If not, it 


fixpoint 
reached 


by existing 
state 


Fig. 2. State space generation. 
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State Space Last Selected Heap 


Fig. 3. Screenshot of ATTESTOR’s frontend for state space exploration. (Color figure 
online) 


adds the resulting state to the state space and starts over by picking a new state. 
Otherwise, it checks whether further states have to be processed or whether a 
fixpoint in the state space generation is reached. In the latter case, this phase is 
terminated. 


2.4 Output 


As shown in Fig.1 (right), we obtain three main outputs once the analysis is 
completed: the computed abstract state space, the derived procedure contracts, 
and the model checking results. For each LTL formula in the specification, results 
comprise the possible answers “formula satisfied”, “formula (definitely) not sat- 
isfied”, or “formula possibly not satisfied”. In case of the latter two, ATTESTOR 
additionally produces a counterexample, i.e. an abstract trace that violates the 
formula. If ATTESTOR was able to verify the non-spuriousness of this counterex- 
ample (second case), we are additionally given a concrete initial heap that is 
accountable for the violation and that can be used as a test case for debugging. 

Besides the main outputs, ATTESTOR provides general information about the 
current analysis. These include log messages such as warnings and errors, but 
also details about settings and runtimes of the analyses. The API provides the 
interface to retrieve ATTESTOR’s outputs as JSON-formatted data. 


2.5 Frontend 


ATTESTOR features a graphical frontend that visualises inputs as well as results 
of all benchmark runs. The frontend communicates with ATTESTOR’s backend 
via the API only. It especially can be used to display and navigate through the 
generated abstract state space and counterexample traces. 

A screenshot of the frontend for state space exploration is found in Fig. 3. 
The left panel is an excerpt of the state space. The right panel depicts the 
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currently selected state, where red boxes correspond to variables and constants, 
circles correspond to allocated objects/locations, and yellow boxes correspond 
to nonterminals of the employed graph grammar, respectively. Arrows between 
two circles represent pointers. Further information about the selected state is 
provided in the topmost panel. Graphs are rendered using cytoscape. js [6]. 


3 Evaluation 


Tool Comparison. While there exists a plethora of tools for analysing pointer 
programs, such as, amongst others, FORESTER [10], GROOVE [7], INFER [5], 
Hip/SLEEK [17], KORAT [16], JUGGRNAUT [9], and TvLA [3], these tools differ 
in multiple dimensions: 


— Input languages range from C code (FORESTER, INFER, HIP/SLEEK) over 
Java/Java Bytecode (JUGGRNAUT, KORAT) to assembly code (TvLA) and 
graph programs (GROOVE). 

— The degree of automation differs heavily: Tools like FORESTER and INFER 
only require source code. Others such as Hip/SLEEK and JUGGRNAUT addi- 
tionally expect general data structure specifications in the form of e.g. graph 
grammars or predicate definitions to guide the abstraction. Moreover, TVLA 
requires additional program-dependent instrumentation predicates. 

— Verifiable properties typically cover memory safety. KORAT is an exception, 
because it applies test case generation instead of verification. The tools 
Hip/SLEEK, TVLA, GROOVE, and JUGGRNAUT are additionally capable of 
verifying data structure invariants, so-called shape properties. Furthermore, 
HIP/SLEEK is able to reason about shape-numeric properties, e.g. lengths of 
lists, if a suitable specification is provided. While these properties are not 
supported by TVLA, it is possible to verify reachability properties. Moreover, 
JUGGRNAUT can reason about temporal properties such as verifying that 
finally every element of an input data structure has been accessed. 


Benchmarks. Due to the above mentioned diversity there is no publicly avail- 
able and representative set of standardised benchmarks to compare the afore- 
mentioned tools [1]. We thus evaluated ATTESTOR on a collection of challenging, 
pointer intensive algorithms compiled from the literature [3,4,10,14]. To assess 
our counterexample generation, we considered invalid specifications, e.g. that a 
reversed list is the same list as the input list. Furthermore, we injected faults 
into our examples by swapping and deleting statements. 


Properties. During state space generation, memory safety (M) is checked. More- 
over, we consider five classes of properties that are verified using the built-in 
LTL model checker: 
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Table 1. The experimental results. All runtimes are in seconds. Verification time 
includes state space generation. SLL (DLL) means singly-linked (doubly-linked) list. 


No. states State space gen. | Verification | Total runtime 
Benchmark Properties Min Max Min | Max Min |Max |Min | Max 
SLL.traverse M, 5S, R, V, N, X 13 97 | 0.030 | 0.074 0.039 | 0.097 | 0.757 | 0.848 
SLL. reverse M 5S, R, V, X 46 268 | 0.050 | 0.109 0.050 | 0.127 | 0.793 | 0.950 
SLL. reverse (recursive) M S V, N, X 40 823 |0.038 | 0.100 0.044 | 0.117 | 0.720 | 0.933 
DLL. reverse M,S, R, V, N, X 70 | 1508 |0.076 | 0.646 0.097 | 0.712 | 0.831 | 1.763 
DLL. findLast M C, X 44 44 | 0.069 | 0.069 0.079 | 0.079 | 0.938 | 0.938 
SLL.findMiddle M,S,R, V, N X 75 456 | 0.060 | 0.184 0.060 | 0.210 | 0.767 | 0.975 
Tree.traverse (Lindstrom) |M, S, V, N 229 | 67941 |0.119 | 8.901 0.119 | 16.52 | 0.845 | 17.36 
Tree.traverse (recursive) |M, S 91 |21738 |0.075 | 1.714 0.074 | 1.765 | 0.849 | 2.894 
AVLTree. binarySearch M,S 192 192 |0.117 | 0.172 0.118 | 0.192 | 0.917 | 1.039 
AVLTree.searchAndBack M, 8, C 455 455 | 0.193 | 0.229 0.205 | 0.289 | 1.081 | 1.335 
AVLTree.searchAndSwap M, 8, © 3855 | 4104 |0.955 | 1.590 1.004 | 1.677 | 1.928 | 2.521 
AVLTree.leftMostInsert M,S 6120 | 6120 | 1.879} 1.942 1.932 | 1.943 | 2.813 | 2.817 
AVLTree. insert M,S 10388 | 10388 | 3.378 | 3.676 3.378 | 3.802 | 4.284 | 4.720 
AVLTree.sl1ToAVLTree M, 8, C 7166 | 7166 | 2.412/ 2.728 2.440 | 2.759 | 3.383 | 3.762 


— The shape property (S) establishes that the heap is of a specific shape, e.g. a 
doubly-linked list or a balanced tree. 

— The reachability property (R) checks whether some variable is reachable from 
another one via specific pointer fields. 

— The visit property (V) verifies whether every element of the input is accessed 
by a specific variable. 

— The neighbourhood property (N) checks whether the input data structure coin- 
cides with the output data structure upon termination. 

— Finally, we consider other functional correctness properties (C), e.g. the return 
value is not null. 


Setup. For performance evaluation, we conducted experiments on an Intel Core 
i7-7500U CPU @ 2.70GHz with the Java virtual machine (OpenJDK version 
1.8.0_151) limited to its default setting of 2 GB of RAM. All experiments were run 
using the Java benchmarking harness JMH. Our experimental results are shown 
in Table 1. Additionally, for comparison purpose we considered Java implemen- 
tations of benchmarks that have been previously analysed for memory safety by 
FORESTER [10], see Table 2. 
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Discussion. The results show that 
both memory safety (M) and shape 


Table 2. FORESTER benchmarks (memory 
safety only). Verification times are in sec- 


(S) are efficiently processed, with 0?4S- 
regard to both state space size and Benchmark No. states| Verification 
runtime. This is not surprising as  SLL.bubblesort 287 0.134 
these properties are directly han- SLL-deleteElement 152 0.096 
dled by the state space generation StLHeadPtr (traverse) 111 0-095 

; g SLL.insertsort 369 0.147 
engine. The most challenging tasks — — 

oe . List0fCyclicLists 313 0.153 

are the visit (V) and neighbourhood SEL insert 379 0.207 
(N) properties as they require tO DLL.insertsorti 4302 1.467 
track objects across program execu- DLL.insertsort2 1332 0.514 
tions by means of markings. The lat- DLL.buildAndReverse | 277 0.164 
ter have a similar impact as pointer CyclicDLL (traverse) | 104 0.108 
variables: increasing their number 17¢?-comstruct 4 0.062 
‘ j Tree.constructAndDSW 1334 0.365 
impedes abstraction as larger parts SkipList. insert 302 0.160 
of the heap have to be kept concrete. SkipList.build 330 0.173 


This effect can be observed for the 
Lindstrom tree traversal procedure 


where adding one marking (V) and three markings (N) both increase the verifi- 
cation effort by an order of magnitude. 
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Abstract. We present TYPPETE, a sound type inferencer that auto- 
matically infers Python 3 type annotations. TYPPETE encodes type con- 
straints as a MAXSMT problem and uses optional constraints and spe- 
cific quantifier instantiation patterns to make the constraint solving pro- 
cess efficient. Our experimental evaluation shows that TYPPETE scales 
to real world Python programs and outperforms state-of-the-art tools. 


1 Introduction 


Dynamically-typed languages like Python have become increasingly popular in 
the past five years. Dynamic typing enables rapid development and adaptation 
to changing requirements. On the other hand, static typing offers early error 
detection, efficient execution, and machine-checked code documentation, and 
enables more advanced static analysis and verification approaches [15]. 

For these reasons, Python’s PEP484 [25] has recently introduced optional 
type annotations in the spirit of gradual typing [23]. The annotations can be 
checked using MyPy [10]. In this paper, we present our tool TYPPETE, which 
automatically infers sound (non-gradual) type annotations and can therefore 
serve as a preprocessor for other analysis or verification tools. 

TYPPETE performs whole-program type inference, as there are no princi- 
pal typings in object-oriented languages like Python [1, example in Sect. 1]; the 
inferred types are correct in the given context but may not be as general as 
possible. The type inference is constraint-based and relies on the off-the-shelf 
SMT solver Z3 [7] for finding a valid type assignment for the input program. 
We show that two main ingredients allow TYPPETE to scale to real programs: (1) 
a careful encoding of subtyping that leverages efficient quantifier instantiation 
techniques [6], and (2) the use of optional type equality constraints, which con- 
siderably reduce the solution search space. Whenever a valid type assignment for 
the input program cannot be found, TYPPETE encodes type error localization 
as an optimization problem [19] and reports only a minimal set of unfulfilled 
constraints to help the user pinpoint the cause of the error. 
© The Author(s) 2018 
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class Item(metaclass=ABCMeta) : 12 class Even(Item): 
@abstractmethod 13 def compete(self , item): 
def compete(self , item): 1 return item .evalEven (self) 
pass 5 


class Odd(Item): 
def evalEven (self, item): 1 def compete(self , item): 
return ”WIN” return item.evalOdd(self) 


def evalOdd(self , item): >0 def match(iteml, item2): 
return ”LOSE” 21 return iteml.compete(item2) 


Fig. 1. A Python implementation of the odds and evens hand game. 


TYPPETE accepts programs written in (a large subset of) Python 3. Having a 
static type system imposes a number of requirements on Python programs: (a) a 
variable can only have a single type through the whole program; (b) generic types 
have to be homogeneous (e.g., all elements of a set must have the same type); 
and (c) dynamic code generation, reflection and dynamic attribute additions and 
deletions are not allowed. The supported type system includes generic classes 
and functions. Users must supply a file and the number of type variables for any 
generic class or function. Typpete then outputs a program with type annotations, 
a type error, or an error indicating use of unsupported language features. 

Our experimental evaluation demonstrates the practical applicability of our 
approach. We show that TYPPETE performs well on a variety of real-world open 
source Python programs and outperforms state-of-the-art tools. 


2 Constraint Generation 


TYPPETE encodes the type inference problem for a Python program into an 
SMT constraint resolution problem such that any solution of the SMT problem 
yields a valid type assignment for the program. The process of generating the 
SMT problem consists of three phases, which we describe below. 

In a first pass over the input program, TYPPETE collects: (1) all globally 
defined names (to resolve forward references), (2) all classes and their respective 
subclass relations (to define subtyping), and (3) upper bounds on the size of cer- 
tain types (e.g., tuples and function parameters). This pre-analysis encompasses 
both the input program— including all transitively imported modules—and stub 
files, which define the types of built-in classes and functions as well as libraries. 
TYPPETE already contains stubs for the most common built-ins; users can add 
custom stub files written in the format that is supported by MyPy. 

In the second phase, TYPPETE declares an algebraic datatype Type, whose 
members correspond one-to-one to Python types. TYPPETE declares one 
datatype constructor for every class in the input program; non-generic classes are 
represented as constants, whereas a generic class with n type parameters is rep- 
resented by a constructor taking n arguments of type Type. As an example, the 
class Odd in Fig. 1 is represented by the constant classogg. TYPPETE also declares 
constructors for tuples and functions up to the maximum size determined in the 
pre-analysis, and for all type variables used in generic functions and classes. 
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The subtype relation <: is represented by an uninterpreted function subtype 
which maps pairs of types to a boolean value. This function is delicate to define 
because of the possibility of matching loops (i.e., axioms being endlessly instanti- 
ated [7]) in the SMT solver. For each datatype constructor, TYPPETE generates 
axioms that explicitly enumerate the possible subtypes and supertypes. As an 
example, for the type classogg, TYPPETE generates the following axioms: 


Vt. subtype(classoaa, t) = (t = claSSoda V t= Classttem V t = ClasSobject) 
Vt. subtype(t, classodd) = (t = claSSnone V t = classoua) 


Note that the second axiom allows None to be a subtype of any other type (as 
in Java). As we discuss in the next section, this definition of subtype allows us to 
avoid matching loops by specifying specific instantiation patterns for the SMT 
solver. A substitution function substitute, which substitutes type arguments for 
type variables when interacting with generic types, is defined in a similar way. 

In the third step, TYPPETE traverses the program while creating an SMT 
variable for each node in its abstract syntax tree, and generating type constraints 
over these variables for the constructs in the program. During the traversal, a 
context maps all defined names (i.e., program variables, fields, etc.) to the corre- 
sponding SMT variables. The context is later used to retrieve the type assigned 
by the SMT solver to each name in the program. Constraints are generated for 
expressions (e.g., call arguments are subtypes of the corresponding parameter 
types), statements (e.g., the right-hand side of an assignment is a subtype of 
the left hand-side), and larger constructs such as methods (e.g., covariance and 
contravariance constraints for method overrides). For example, the (simplified) 
constraint generated for the call to item1.compete(item2) at line 21 in Fig. 1 
contains a disjunction of cases depending on the type of the receiver: 


(Vitem1 = ClasSoug ^ competegyy = f_2(classoda, arg, ret) A subtype(Vitem2, arg) ) 
V (Vitemt = ClaSSEven ^ competez,., = f_2(classeyen, arg, ret) A subtype(Vitem2, arg)) 


where f_2 is a datatype constructor for a function with two parameter types (and 
one return type ret), and Vitem1 and Vitem2 are the SMT variables corresponding 
to item1 and item2, respectively. 

The generated constraints guarantee that any solution yields a correct type 
assignment for the input program. However, there are often many different valid 
solutions, as the constraints only impose lower or upper bounds on the types rep- 
resented by the SMT variables (e.g., subtype(Vitem2, arg) shown above imposes 
only an upper bound on the type of Vitem2). This has an impact on performance 
(cf. Sect. 4) as the search space for a solution remains large. Moreover, some type 
assignments could be more desirable than others for a user (e.g., a user would 
most likely prefer to assign type int rather than object to a variable initial- 
ized with value zero). To avoid these problems, TYPPETE additionally generates 
optional type equality constraints in places where the mandatory constraints only 
demand subtyping (i.e., local variable assignments, return statements, passed 
function arguments), thereby turning the SMT problem into a MAXSMT opti- 
mization problem. For instance, in addition to subtype(Vitem2, arg) shown above, 
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TYPPETE generates the optional equality constraint Vitem2 = arg. The optional 
constraints guide the solver to try the specified exact type first, which is often 
a correct choice and therefore improves performance, and additionally leads to 
solutions with more precise variable and parameter types. 


3 Constraint Solving 


TYPPETE relies on Z3 [7] and the MaxRes [18] algorithm for solving the gener- 
ated type constraints. We use e-matching [6] for instantiating the quantifiers used 
in the axiomatization of the subtype function (cf. Sect.2), and carefully choose 
instantiation patterns that ensure that any choice made during the search imme- 
diately triggers the instantiation of the relevant quantifiers. For instance, for the 
axioms shown in Sect. 2, we use the instantiation patterns subtype(classoua, t) and 
subtype(t, classogg), respectively. Our instantiation patterns ensure that as soon 
as one argument of an application of the subtype function is known, the quan- 
tifier that enumerates the possible values of the other argument is instantiated, 
thus ensuring that the consequences of any type choices propagate immediately. 
With a naive encoding, the solver would have to guess both arguments before 
being able to check whether the subtype relation holds. The resulting constraint 
solving process is much faster than it would be when using different quantifier 
instantiation techniques such as model-based quantifier instantiation [12], but 
still avoids the potential unsoundness that can occur when using e-matching 
with insufficient trigger expressions. 

When the MAxSMT problem is satisfiable, TYPPETE queries Z3 for a model 
satisfying all type constraints, retrieves the types assigned to each name in the 
program, and generates type annotated source code for the input program. For 
instance, for the program shown in Fig. 1, TYPPETE automatically annotates the 
function evalEven with type Even for the parameter item and a str return type. 
Note that Item and object would also be correct type annotations for item; the 
choice of Even is guided by the optional type equality constraints. 

When the MAxSMT problem is unsatisfiable, instead of reporting the unful- 
filled constraints in the unsatistiable core returned by Z3 (which is not guaran- 
teed to be minimal), TYPPETE creates a new relaxed MAXSMT problem where 
only the constraints defining the subtype function are enforced, while all other 
type constraints are optional. Z3 is then queried for a model satisfying as many 
type constraints as possible. The resulting type annotated source code for the 
input program is returned along with the remaining minimal set of unfulfilled 
type constraints. For instance, if we remove the abstract method compete of class 
Item in Fig. 1, TYPPETE annotates the parameters of the function match at line 
20 with type object and indicates the call compete at line 21 as problematic. By 
observing the mismatch between the type annotations and the method call, the 
user has sufficient context to quickly identify and correct the type error. 
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| T(SMT) | T(MaxSMT) | Unfulfilled | _T(Relaxed) | PYTYPE 


adventure 2.99s / 6.30s 3.27s / 6.76s 42 / 2 | 1.95s / 8.83s 0 [0] 
icemu 9.45s / 6.79s 9.51s / 3.63s 4 / 2 | 0.08s / 21.76s 18 [2] 
imp | 16.88s / 59.95s | 16.91s / 15.87s 67 / 2 | 0.82s / 82.56s 3 [2] 

scion 4.65s / 3.35s 4.72s / 2.97s 28 / 2| 0.16s / 3.39s 0 [0] 

test suite | 14.66s / 1.63s | 14.66s / 2.17s - -| 55 [34] 


Fig. 2. Evaluation of TyPPETE on small programs and larger open source projects. 


4 Experimental Evaluation 


In order to demonstrate the practical applicability of our approach, we evaluated 
our tool TYPPETE on a number of real-world open-source Python programs that 
use inheritance, operator overloading, and other features that are challenging for 
type inference (but not features that make static typing impossible): 


adventure [21]: An implementation of the Colossal Cave Adventure game (2 
modules, 399 LOC). The evaluation (and reported LOC) excludes the mod- 
ules game.py and prompt.py, which employ dynamic attribute additions. 

icemu [8]: A library that emulates integrated circuits at the logic level (8 mod- 
ules, 530 LOC). We conducted the evaluation on revision 484828f. 

imp [4]: A minimal interpreter for the imp toy language (7 modules, 771 LOC). 
The evaluation excludes the modules used for testing the project. 

scion [9]: A Python implementation of a new Internet architecture (2 modules, 
725 LOC). For the evaluation, we used path_store.py and scion_addr.py 
from revision 6f60ccc, and provided stub files for all dependencies. 


We additionally ran TYPPETE on our test suite of manually-written programs 
and small programs collected from the web (47 modules and 1998 LOC). 

In order to make the projects statically typeable, we had to make a num- 
ber of small changes that do not impact the functionality of the code, such as 
adding abstract superclasses and abstract methods, and (for the imp and scion 
projects) introducing explicit downcasts in few places. Additionally, we made a 
number of other innocuous changes to overcome the current limitations of our 
tool, such as replacing keyword arguments with positional arguments, replacing 
generator expressions with list comprehensions, and replacing super calls via 
inlining. The complete list of changes for each project is included in our artifact. 

The experiments were conducted on an 2.9 GHz Intel Core i5 processor with 
8GB of RAM running Mac OS High Sierra version 10.13.3 with Z3 version 
4.5.1. Figure2 summarizes the result of the evaluation. The first two columns 
show the average running time (over ten runs, split into constraint generation 
and constraint solving) for the type inference in which the use of optional type 
equality constraints (cf. Sect.2) is disabled (SMT) and enabled (MAxSMT), 
respectively. We can observe that optional type equality constraints (consid- 
erably) reduce the search space for a solution as disabling them significantly 
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increases the running time for larger projects. We can also note that the con- 
straint solving time improves significantly when the type inference is run on 
the test suite, which consists of many independent modules. This suggests that 
splitting the type inference problem into independent sub-problems could fur- 
ther improve performance. We plan to investigate this direction as part of our 
future work. 

The third column of Fig. 2 shows the evaluation of the error reporting feature 
of TYPPETE (cf. Sect. 3). For each benchmark, we manually introduced two type 
errors that could organically happen during programming and compared the 
size of the unsatisfiable core (left of /) and the number of remaining unfulfilled 
constraints (right of /) for the original and relaxed MAXSMT problems given 
to Z3, respectively. We also list the times needed to prove the first problem 
unsatisfiable and solve the relaxed problem. As one would expect, the number 
of constraints that remain unfulfilled for the relaxed problems is considerably 
smaller, which demonstrates that the error reporting feature of TYPPETE greatly 
reduces the time that a user needs to identify the source of a type error. 

Finally, the last column of Fig. 2 shows the result of the comparison of TYP- 
PETE with the state-of-the-art type inferencer PYTYPE [16]. PYTYPE infers 
PEP484 [25] gradual type annotations by abstract interpretation [5] of the 
bytecode-compiled version of the given Python file. In Fig. 2, for the considered 
benchmarks, we report the number of variables and parameters that PyTYPE 
leaves untyped or annotated with Any. We excluded any module on which 
PyTYPE yields an error; in square brackets we indicate the number of mod- 
ules that we could consider. TYPPETE is able to fully type all elements and thus 
outperforms PYTYPE for static typing purposes. On the other hand, we note that 
PYTYPE additionally supports gradual typing and a larger Python subset. 


5 Related and Future Work 


In addition to PYTYPE, a number of other type inference approaches and tools 
have been developed for Python. The approach of Maia et al. [17] has some 
fundamental limitations such as not allowing forward references or overloaded 
functions and operators. Fritz and Hage [11] as well as STARKILLER [22] infer sets 
of concrete types that can inhabit each program variable to improve execution 
performance. The former sacrifices soundness to handle more dynamic features of 
Python. Additionally, deriving valid type assignments from sets of concrete types 
is non-trivial. MyPy and a project by Cannon [3] can perform (incomplete) type 
inference for local variables, but require type annotations for function parameters 
and return types. PYANNOTATE [13] dynamically tracks variable types during 
execution and optionally annotates Python programs; the resulting annotations 
are not guaranteed to be sound. A similar spectrum of solutions exists for other 
dynamic programming languages like JavaScript [2,14] and ActionScript [20]. 
The idea of using SMT solvers for type inference is not new. Both F* [24] and 
LiquidHaskell [26] (partly) use SMT-solving in the inference for their dependent 
type systems. Pavlinovic et al. [19] present an SMT encoding of the OCaml type 
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system. TYPPETE’s approach to type error reporting can be seen as a simple 
instantiation of their approach. 


As part of our future work, we want to explore whether our system can be 


adapted to infer gradual types. We also aim to develop heuristics for inferring 
which functions and classes should be annotated with generic types based on the 
reported unfulfilled constraints. Finally, we plan to explore the idea of splitting 
the type inference into multiple separate problems to improve performance. 


Acknowledgments. We thank the anonymous reviewers for their feedback. This work 
was supported by an ETH Zurich Career Seed Grant (SEED-32 16-2). 
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Abstract. JKIND is an open-source industrial model checker developed 
by Rockwell Collins and the University of Minnesota. JKIND uses mul- 
tiple parallel engines to prove or falsify safety properties of infinite state 
models. It is portable, easy to install, performance competitive with other 
state-of-the-art model checkers, and has features designed to improve the 
results presented to users: inductive validity cores for proofs and coun- 
terexample smoothing for test-case generation. It serves as the back-end 
for various industrial applications. 


1 Introduction 


JKIND is an open-source! industrial infinite-state inductive model checker for 
safety properties. Models and properties in JKIND are specified in LUSTRE [17], 
a synchronous data-flow language, using the theories of linear real and integer 
arithmetic. JKIND uses SMT-solvers to prove and falsify multiple properties in 
parallel. A distinguishing characteristic of JKIND is its focus on the usability of 
results. For a proven property, JKIND provides traceability between the prop- 
erty and individual model elements. For a falsified property, JKIND provides 
options for simplifying the counterexample in order to highlight the root cause 
of the failure. In industrial applications, we have found these additional usability 
aspects to be at least as important as the primary results. Another important 
characteristic of JKIND is that is it designed to be integrated directly into user- 
facing applications. Written in Java, JKIND runs on all major platforms and 
is easily compiled into other Java applications. JKIND bundles the Java-based 
SMTINTERPOL solver and has no external dependencies. However, it can option- 
ally call Z3, YicEs 1, YIcES 2, CVC4, and MATHSAT if they are available. 


2 Functionality and Main Features 


JKIND is structured as several parallel engines that coordinate to prove prop- 
erties, mimicking the design of PKIND and KIND 2 [8,21]. Some engines are 


1 https: //github.com/agacek/jkind. 


© The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 20-27, 2018. 
https: //doi.org/10.1007/978-3-319-96142-2_3 


The JKIND Model Checker 21 


valid 
properties 


invariants 


valid 
properties 


invariants 


base step 


k-Induction 


valid & invalid 
properties 


valid 
properties 


invalid 
properties 


Fig. 1. JKIND engine architecture 


directly responsible for proving properties, others aid that effort by generating 
invariants, and still others are reserved for post-processing of proof or coun- 
terexample results. Each engine can be enabled or disabled separately based 
on the user’s needs. The architecture of JKIND allows any engine to broadcast 
information to the other engines (for example, lemmas, proofs, counterexamples) 
allowing straightforward integration of new functionality. 

The solving engines in JKIND are show in Fig.1. The Bounded Model 
Checking (BMC) engine performs a standard iterative unrolling of the transi- 
tion relation to find counterexamples and to serve as the base case of k-induction. 
The BMC engine guarantees that any counterexample it finds is minimal in 
length. The k-Induction engine performs the inductive step of k-induction, 
possibly using invariants generated by other engines. The Invariant Genera- 
tion engine uses a template-based invariant generation technique [22] using its 
own k-induction loop. The Property Directed Reachability (PDR) engine 
performs property directed reachability [11] using the implicit abstraction tech- 
nique [9]. Unlike BMC and k-induction, each property is handled separately by a 
different PDR sub-engine. Finally, the Advice engine produces invariants based 
on previous runs of JKIND as described in the next section. 

Invariant sharing between the solvers (shown in Fig. 1) is an important part 
of the architecture. In our internal benchmarking, we have found that implicit 
abstraction PDR performs best when operating over a single property at a time 
and without use of lemmas generated by other approaches. On the other hand, 
the invariants generated by PDR and template lemma generation often allow 
k-induction, which operates on all properties in parallel, to substantially reduce 
the verification time required for models with large numbers of properties. 


2.1 Post Processing and Re-verification 


A significant part of the research and development effort for JKIND has focused 
on post-processing results for presentation and repeated verification of models 
under development. 


Inductive Validity Cores (IVC). For a proven property, an inductive valid- 
ity core is a subset of LUSTRE equations from the input model for which the 
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property still holds [13,14]. Inductive validity cores can be used for traceability 
from property to model elements and determining coverage of the model by a 
set of properties [15]. This facility can be used to automatically generate trace- 
ability and adequacy information (such as traceability matrices [12] important 
to the certification of safety-critical avionics systems [26]). The IVC engine uses 
a heuristic algorithm to efficiently produce minimal or nearly minimal cores. In 
a recent experiment over a superset of the benchmark models described in the 
experiment in Sect.3, we found that our heuristic IVC computation added 31% 
overhead to model checking time, and yielded cores approximately 8% larger 
than the guaranteed minimal core computed by a very expensive “brute force” 
algorithm. As a side-effect, the IVC algorithm also minimizes the set of invariants 
used to prove a property and emits this reduced set to other engines (notably 
the Advice engine, described below). 


Smoothing. To aid in counterexample understanding and in creating structural 
coverage tests that can be more easily explained, JKIND provides an optional 
post-processing step to minimize the number of changes to input variables— 
smoothing the counterexample. For example, applied to 129 test cases generated 
for a production avionics flight control state machine, smoothing increased run- 
time by 40% and removed 4 unnecessary input changes per test case on aver- 
age. The smoothing engine uses a MAXSAT query over the original BMC-style 
unrolling of the transition relation combined with weighted assertions that each 
input variable does not change on each step. The MAXSAT query tries to satisfy 
all of these weighted assertions, but will break them if needed. This has the effect 
of trying to hold all inputs constant while still falsifying the original property 
and only allowing inputs to change when needed. This engine is only available 
with SMT-solvers that support MAXSAT such as YICES 1 and Z3. 


Advice. The advice engine saves and re-uses the invariants that were used by 
JKIND to prove the properties of a model. Prior to analysis, JKIND performs 
model slicing and flattening to generate a flat transition-relation model. Inter- 
nally, invariants are stored as a set of proven formulas (in the LUSTRE syntax) 
over the variables in the flattened model. An advice file is simply the emitted 
set of these invariant formulas. When a model is loaded, the formulas are loaded 
into memory. Formulas that are no longer syntactically or type correct are dis- 
carded, and the remaining set of formulas are submitted as an initial set of 
possible invariants to be proved via k-induction. If they are proved, they are 
passed along to other engines; if falsified, they are discarded. Names constructed 
between multiple runs of JKIND are stable, so if a model is unchanged, it can be 
usually be re-proved quickly using the invariants and k-induction. If the model is 
slightly changed, it is often the case that most of the invariants can be re-proved, 
leading to reduced verification times. 

If the IVC engine is also enabled, then advice emits a (close to) minimal 
set of lemmas used for proof; this often leads to faster re-verification (but more 
expensive initial verification), and can be useful for examining which of the 
generated lemmas are useful for proofs. 
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Fig. 2. Performance benchmarks 


3 Experimental Evaluation 


We evaluated the performance of JKIND against KIND 2 [8], ZUSTRE [20], Gen- 
eralized PDR in Z3 [19], and IC3 in NUXMv [9]. We used the default options 
for each tool (using check_invar_ic3 for NUXMV). Our benchmark suite comes 
from [9] and contains 688 models over the theory of linear integer arithmetic’. 
All experiments were performed on a 64-bit Ubuntu 17.10 Linux machine with 
a 12-core Intel Xeon CPU E5-1650 v3 @ 3.50 GHz, with 32GB of RAM and a 
time limit of 60s per model. 

Performance comparisons are show in Fig. 2. The key describes the number 
of benchmarks solved for each tool, and the graph shows the aggregate time 
required for solving, ordered by time required per-problem, ordered indepen- 
dently for each tool. JKIND was able to verify or falsify the most properties, 
although Z3 was often the fastest tool. Many of the benchmarks in this set 
are quickly evaluated: Z3 solves the first 400 benchmarks in just over 12s. Due 
to JKIND’s use of Java, the JVM/JKIND startup time for an empty model is 
approximately 0.35s, which leads to poor performance on small models’. As 
always, such benchmarks should be taken with a large grain of salt. In [8], a 
different set of benchmarks slightly favored KIND 2, and in [9], NUXMv was the 
most capable tool. We believe that all the solvers are relatively competitive. 


4 Integration and Applications 


JKIND is the back-end for a variety of user-facing applications. In this section, 
we briefly highlight a few of these applications and how they employ the features 
discussed previously. 


? https: //es.fbk.eu/people/griggio/papers/tacas14-ic3ia.tar.bz2. Note that we remo- 
ved 263 duplicate benchmarks from the original set. 
3 Without startup time, the curve for JKIND is close to the curve for ZUSTRE. 
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The Specification and Analysis of Requirements (SPEAR) tool is an open- 
source tool for prototyping and analysis of requirements [12]. Starting from 
a set of formalized requirements, SPEAR uses JKIND to determine whether 
or not the requirements meet certain properties. It uses IVCs to create 
a traceability matrix between requirements and properties, highlighting 
unused requirements, over-constrained properties, and other common prob- 
lems. SPEAR also uses JKIND with smoothing for test-case generation using 
the Unique First Cause criteria [28]. 

The Assume Guarantee Reasoning Environment (AGREE) tool is an 
open-source compositional verification tool that proves properties of 
hierarchically-composed models in the Architectural Analysis and Design 
Language (AADL) language [3,10,23]. AGREE makes use of multiple 
JKIND features including smoothing to present clear counterexamples, [VC 
to show requirements traceability, and counterexample generation to check 
the consistency of an AADL component’s contract. AGREE also uses 
JKIND for test-case generation from component contracts. 

The Static IMPerative AnaLyzer (SIMPAL) tool is an open-source tool for 
compositional reasoning over software [27]. SIMPAL is based on LIMP, a 
LusTRE-like imperative language with extensions for control flow elements, 
global variables, and a syntax for specifying preconditions, postconditions, 
and global variable interactions of preexisting components. SIMPAL trans- 
lates LIMP programs to an equivalent LUSTRE representation which is passed 
to JKIND to perform assume-guarantee reasoning, reachability, and viability 
analyses. 

JKIND is also used by two proprietary tools used by product areas within 
Rockwell Collins. The first is a Mode Transition Table verification tool used 
for the complex state machines which manage flight modes of an aircraft. 
JKIND is used to check properties and generate tests for mode and transi- 
tion coverage from LUSTRE models generated from the state machines. [VCs 
are used to establish traceability, i.e. which transitions are covered by which 
properties. The second is a Crew Alerting System MC/DC test-case gener- 
ation tool for a proprietary domain-specific language used for messages and 
alerts to airplane pilots. Smoothing is very important in this context as test 
cases need to be run on the actual hardware where timing is not precisely 
controllable. Thus, test cases with a minimum of changes to the inputs are 
ideal. 


Related Work 


JKIND is one of a number of similar infinite-state inductive model checkers 
including KIND 2 [8], NUXMv [9], Z3 with generalized PDR [19], and Zus- 
TRE [20]. They operate over a transition relation described either as a LUSTRE 
program (KIND 2, JKIND, and ZUSTRE), an extension of the SMV language 
(NUXMV), or as a set of Horn clauses (Z3). Each tool uses a portfolio-based solver 
approach, with NUXMV, JKIND, and KIND 2 all supporting both k-induction 
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and a variant of PDR/IC3. NUXMv also supports guided reachability and k- 
liveness. Other tools such as ESBMC-DEpTHK [25], VVT [4] CPACHECKER, 
[5], CPROVER [7] use similar techniques for reasoning about C programs. 

We believe that the JKIND IVC support is similar to proof-core support 
provided by commercial hardware model checkers: Cadence Jasper Gold and 
Synopsys VC Formal [1,2,18]. The proof-core provided by these tools is used for 
internal coverage analysis measurements performed by the tools. Unfortunately, 
the algorithms used in the commercial tool support are undocumented and per- 
formance comparisons are prohibited by the tool licenses, so it is not possible to 
compare performance on this aspect. 

Previous work has been done on improving the quality of counterexamples 
along various dimensions similar to the JKIND notion of smoothing, e.g. [16,24]. 
Our work is distinguished by its focus on minimizing the number of deltas in 
the input values. This metric has been driven by user needs and by our own 
experiences with test-case generation. 

There are several tools that support reuse or exchange of verification results, 
similar to our advice feature. Recently, there has been progress on standardized 
formats [6] of exchange between analysis tools. Our current advice format is 
optimized for use and performance with our particular tool and designed for re- 
verification rather than exchange of partial verification information. However, 
supporting a standardized format for exchanging verification information would 
be a useful feature for future use. 


6 Conclusion 


JKIND is similar to a number of other solvers that each solve infinite state 
sequential analysis problems. Nevertheless, it has some important features that 
distinguish it. First, a focus on quality of feedback to users for both valid prop- 
erties (using IVCs) and invalid properties (using smoothing). Second, it is sup- 
ported across all major platforms and is straightforward to port due to its imple- 
mentation in Java. Third, it is small, modular, and well-architected, allowing 
straightforward extension with new engines. Fourth, it is open-source with a 
liberal distribution license (BSD), so it can be adapted for various purposes, as 
demonstrated by the number of tools that have incorporated it. 


Acknowledgments. The work presented here was sponsored by DARPA as part of 
the HACMS program under contract FA8750-12-9-0179. 
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Abstract. In this paper we describe the DEEPSEC prover, a tool for 
security protocol analysis. It decides equivalence properties modelled as 
trace equivalence of two processes in a dialect of the applied pi calculus. 


1 Introduction 


Cryptographic protocols ensure the security of communications. They are dis- 
tributed programs that make use of cryptographic primitives, e.g. encryption, 
to ensure security properties, such as confidentiality or anonymity. Their correct 
design is quite a challenge as security is to be enforced in the presence of an 
arbitrary adversary that controls the communication network and may compro- 
mise participants. The use of symbolic verification techniques, in the line of the 
seminal work by Dolev and Yao [19], has proven its worth in discovering logical 
vulnerabilities or proving their absence. 

Nowadays mature tools exist, e.g. [7,10,24] but mostly concentrate on trace 
properties, such as authentication and (weak forms of) confidentiality. Unfor- 
tunately many properties need to be expressed in terms of indistinguishability, 
modelled as behavioral equivalences in dedicated process calculi. Typically, a 
strong version of secrecy states that the adversary cannot distinguish the sit- 
uation where a value v1, respectively v2, is used in place of a secret. Privacy 
properties, e.g., vote privacy, are also stated similarly [2,4,18]. 

In this paper we present the DEEPSEC prover (Deciding Equivalence Proper- 
ties in Security protocols). The tool decides trace equivalence for cryptographic 
protocols that are specified in a dialect of the applied pi calculus [1]. DEEPSEC 
offers several advantages over existing tools, in terms of expressiveness, preci- 
sion and efficiency: typically we do not restrict the use of private channels, allow 
else branches, and decide trace equivalence precisely, i.e., no approximations 
are applied. Cryptographic primitives are user specified by a set of subterm- 
convergent rewrite rules. The only restriction we make on protocol specifications 
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is that we forbid unbounded replication, i.e. we restrict the analysis to a finite 
number of protocol sessions. This restriction is similar to that of several other 
tools and sufficient for decidability. Note that decidability is nevertheless non- 
trivial as the system under study is still infinite-state due to the active, arbitrary 
attacker participating to the protocol. 


2 Description of the Tool 


2.1 Example: The Helios Voting Protocol 


An input of DEEPSEc defines the cryptographic primitives, the protocol and 
the security properties that are to be verified. Random numbers are abstracted 
by names (a,b,...), cryptographic primitives by function symbols with arity 
(f/n) and messages by terms viewed as modus operandi to compute bit- 
string. For instance, the functions aenc/3, pk/1 model randomized asymmetric 
encryption and public-key generation: term aenc(pk(k),r,m) models the plain 
text m encrypted with public key pk(k) and randomness r. In DEEPSEC we 
write: 


fun aenc/3. fun pk/1. 


On the other hand, cryptographic destructors are specified by rewrite rules. For 
example asymmetric decryption (adec) would be defined by 


reduc adec(k,aenc(pk(k),r,m)) -> m. 


A plain text m can thus be retrieved from a cipher aenc(pk(k),r,m) and the 
corresponding private key k. Such user-defined rewrite rules also allow us to 
describe more complex primitives such as a zero-knowledge proof (ZKP) assert- 
ing knowledge of the plaintext and randomness of a given ciphertext: 


fun zkp/3. 
const zpkok. 
reduc check(zkp(r,v,aenc(p,r,v)), aenc(p,r,v)) -> zkpok. 


Although user-defined, the rewrite system is required by DEEPSEC to be subterm 
convergent, i.e., the right hand side is a subterm of the left hand side or a ground 
term in normal form. Support for tuples and projection is provided by default. 


Protocol Specification. Honest participants in a protocol are modeled as pro- 
cesses. For instance, the process Voter (auth,id,v,pkE) describes a voter in 
the Helios voting protocol. The process has four arguments: an authenticated 
channel auth, the voter’s identifier id, its vote v and the public key of the tally 
pkE. 
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let Voter(auth,id,v,pkE) = 
new rT; 


The voter first generates a ran- 
dom number r that will be used 
let bal = aenc(pkE,r,v) in for encryption and ZKP. After that, 
out (auth, bal) ; she encrypts her vote and assigns 
out(c, (id, bal, zkp(r,v,bal))). — it to the variable bal which is out- 
put on the channel auth. Finally, 


let VotingSystem(v1,v2) = she outputs the ballot, id and the 
new k; new auth1; new auth2; corresponding ZKP on the pub- 
out(c,pk(k)); ( lic channel c. All in all, the pro- 


Voter (authi,idi,vi,pk(k)) | 
Voter (auth2,id2,v2,pk(k)) | 
Tally(k,authi,auth2) ). 


cess VotingSystem(v1,v2) repre- 
sents the complete voting scheme: 
two honest voters idi and id2 
respectively vote for vi and v2; the 
process Tally collects the ballots, checks the ZKP and outputs the result of 
the election. The instances of the processes Voter and Tally are executed con- 
currently, modeled by the parallel operator |. Other operators supported by 
DEEPSEC include input on a channel (in(c,x); P), conditional (if u = v then 
P else Q) and non-deterministic choice (P + Q). 


Security Properties. DEEPSEC focuses on properties modelled as trace equiv- 
alence, e.g. vote privacy [18] in the Helios protocol. We express it at indistin- 
guishability of two instances of the protocol swapping the votes of two honest 
voters: 


query trace_equiv(VotingSystem(yes,no) , VotingSystem(no,yes)). 


DEEPSEC checks whether an attacker, implicitly modelled by the notion of 
trace equivalence, cannot distinguish between these two instances. Note that all 
actions of dishonest voters can be seen as actions of this single attacker entity; 
thus only honest participants need to be specified in the input file. 


2.2 The Underlying Theory 


We give here a high-level overview of how DEEPSEc decides trace equivalence. 
Further intuition and details can be found in [14]. 


Symbolic Setting. Although finite-depth, even non-replicated protocols have infi- 
nite state space. Indeed, a simple input in(c,x) induces infinitely-many poten- 
tial transitions in presence of an active attacker. We therefore define a symbolic 
calculus that abstracts concrete inputs by symbolic variables, and constraints 
that restrict their concrete instances. Constraints typically range over deducibil- 
ity contraints (“the attacker is able to craft some term after spying on public 
channels”) and equations (“two terms are equal”). A symbolic semantics then 
performs symbolic inputs and collects constraints on them. Typically, executing 
input in(c,x) generates a deducibility constraint on x to model the attacker 
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being able to craft the message to be input; equations are generated by condi- 
tionals, relying on most general unifiers modulo equational theory. 


Decision Procedure. DEEPSEC constructs a so-called partition tree to guide deci- 
sion of (in)equivalence of processes P and Q. Its nodes are labelled by sets of 
symbolic processes and constraints; typically the root contains P and Q with 
empty constraints. The tree is constructed similarly to the (finite) tree of all 
symbolic executions of P and Q, except that some nodes may be merged or 
split accordingly to a constraint-solving procedure. DEEPSEC thus enforces that 
concrete instances of processes of a same node are indistinguishable (statically). 

The final decision criterion is that P and Q are equivalent iff all nodes of the 
partition tree contain both a process originated from P and a process originated 
from Q by symbolic execution. The DEEPSEC prover thus returns an attack iff 
it finds a node violating this property while constructing the partition tree. 


2.3 Implementation 


DEEPSEC is implemented in Ocaml (16k LOC) and the source code is licensed 
under GPL 3.0 and publicly available [17]. Running DEEPSEc yields a terminal 
output summarising results, while a more detailed output is displayed graphically 
in an HTML interface (using the MathJax API [20]). When the query is not 
satisfied, the interface interactively shows how to mount the attack. 


Partial-Order Reductions. Tools verifying equivalences for bounded number of 
sessions suffer from a combinatorial explosion as the number of sessions increases. 
We therefore implemented state-of-the-art partial-order reductions (POR) [8] 
that eliminate redundant interleavings, providing a significant speedup. This 
is only possible for a restricted class of processes (determinate processes) but 
DEEPSEC automatically checks whether POR can be activated. 


Parallelism. DEEPSEC generates a partition tree (cf Sect.2.2) to decide trace 
equivalence. As sibling nodes are independent, the computation on subtrees can 
be parallelized. However, the partition tree is not balanced, making it hard to 
balance the load. One natural solution would be to systematically add children 
nodes into a queue of pending jobs, but this would yield an important commu- 
nication overhead. Consequently, we apply this method only until the size of the 
queue is larger than a given threshold; next each idle process fetches a node and 
computes the complete corresponding subtree. Distributed computation over n 
cores is activated by the option -distributed n. By default, the threshold in 
the initial generation of the partition tree depends on n but may be overwritten 
to m with the option -nb_sets m. 


3 Experimental Evaluation 


Comparison to Other Work. When the number of sessions is unbounded, equiv- 
alence is undecidable. Verification tools in this setting therefore have to sacrifice 
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termination, and generally only verify the finer diff-equivalence [9,11,23], too 
fine-grained on many examples. We therefore focus on tools comparable to 
DEEPSEC, i.e. those that bound the number of sessions. SPEC [25,26] verifies a 
sound symbolic bisimulation, but is restricted to fixed cryptographic primitives 
(pairing, encryption, signatures, hashes) and does not allow for else branches. 
APTE [13] covers the same primitives but allows else branches and decides 
trace equivalence exactly. On the contrary, AKISS [12] allows for user-defined 
primitives and terminates when they form a subterm-convergent rewrite sys- 
tem. However AKISS only decides trace equivalence without approximation for 
a subclass of processes (determinate processes) and may perform under- and 
over-approximations otherwise. SAT-Eq [15] proceeds differently by reducing 
the equivalence problem to Graph Planning and SAT Solving: the tool is more 
efficient than the others by several orders of magnitude, but is quite restricted in 
scope (it currently supports pairing, symmetric encryption, and can only analyse 
a subclass of determinate processes). Besides, SAT-EQ may not terminate. 


Authentication. Figure 1 displays a sample of our benchmarks (complete results 
can be found in [17]). DEEPSEc clearly outperforms Akiss, APTE, and SPEC, 
but SAT-EQ takes the lead as the number of sessions increase. However, the 
Otway-Rees protocol already illustrates the scope limit of SAT-Eq. 

Besides, as previously mentioned, DEEPSEC includes partial-order reductions 
(POR). We performed experiments with and without this optimisation: for exam- 
ple, protocols requiring more than 12h of computation time without POR can 
be verified in less than a second. Note that AkIss and APTE also implement 
the same POR techniques as DEEPSEC. 


Protocol (# of roles) |Akiss| APTE | SPEC |Sat-Eq|DeepSec|No POR 
3 ¥<ls|¥ <ls |W lls |V <ls |W <ls |W 1s 
6 Y<ls|¥ 1s @) y <ls |V <lIs |Y 13s 
Denning- T v 6s |V 3s ¥ <ls |V <ls |V 9m 45s 
Sacco 10 @ |V 9m49 y <ls |V <ls 
12 y <ls |V <is 
29 y <ls |V 6s 
3 ¥<ls|\Y¥ <ls |V 7s |V <ls |V <ls |V <ls 
6 y 2s |v Als @ |¥ <Is |V <ls |“ 16m 
Yahalom- 7 y 42s | /34m38s yY is |V <is 
Lowe 10 @x) Y is |v <is 
17 y 12s |V 8s 
3 y 28s |Y 2s |/58m9s y <ls |Y <ls 
6 @x) es) y <ls |/39m 41s 
Otway-Rees 7 x V els 
14 y 5m28s 
v equivalence proved X out of scope out of memory/stack overflow timeout (12H) 
Fig. 1. Benchmark results on classical authentication protocols 
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Protocol (# roles) | Akiss | APTE |DeepSec)| | Helios variant (# roles) | DeepSec 

2 |v <ls |V <ls |V <is Vanilla 6 |f <ls 

4 |y <ls |Y Is |Y <ls No revote W 6 V Is 

Passive 6 |/2m22s|/1m26s|Y <ls No revote ZKP 6 V 2s 
Authentication 7 |/W1h42m|/1m40s/¥ 1s Dishonest revote W | 10 |/30m 24s 
9 Ylh55m|/ <ls Dishonest revote ZKP| 10 |V 9m 26s 

15 ¥Y 4s Honest revote W 11 | 2s 
21 V 8s Honest revote ZKP | 11 |V 2h 42m 

¥ equivalence proved / attack found timeout (12H 


Fig. 2. Benchmark results for verifying privacy type properties 


Privacy. We also verified privacy properties on the private authentication pro- 
tocol [2], the passive-authentication and basic-access-control protocols from the 
e-passport [21], AKA of the 3G telephony networks [6] and the voting protocols 
Helios [3] and Prét-a-Voter [22]. DEEPSEC is the only tool that can prove vote 
privacy on the two voting protocols, and private authentication is out of the 
scope of SAT-EQ and SPEC. Besides, we analysed variants of the Helios vot- 
ing protocol, based on the work of Arapinis et al. [5] (see Fig. 2). The vanilla 
version is known vulnerable to a ballot-copy attack [16], which is patched by a 
ballot weeding (W) or a zero-knowledge proof (ZKP). DEEPSEC proved that, 
(i) when no revote is allowed, or (ii) when each honest voter only votes once 
and a dishonest voter is allowed to revote, then both patches are secure. How- 
ever, only the ZKP variant remains secure when honest voters are allowed to 
revote. 


Parallelism. Experiments have been carried out on a server with 40 Intel Xeon 
E5-2687W v3 CPUs 3.10 GHz, with 50GB RAM and 25 MB L3 Cache, using 
35 cores (Server 1). However the performances of parallelisation had some unex- 
pected behavior. For example, on the Yahalom-Lowe protocol, the use of too 
many cores on a same server negatively impacts performances: e.g. on Server 1, 
optimal results are achieved using only 20 to 25 cores. In comparison, opti- 
mal results required 40-45 cores on a server with 112 Intel Xeon vE7-4850 
v3 CPUs 2.20 GHz, with 1.5TB RAM and 35MB L3 Cache (Server 2). This 
difference may be explained by cache capacity: overloading servers with pro- 
cesses (sharing cache) beyond a certain threshold should indeed make the hit- 
miss ratio drop. This is consistent with the Server 2 having a larger cache and 
exploiting efficiently more cores than Server 1. Using the perf profiling tool, we 
confirmed that the number of cache-references per second (CRPS) stayed rela- 
tively stable up to the optimal number of cores and quickly decreased beyond 
(Fig. 3). 

DEEPSEC can also distribute on multiple servers, using SSH connections. 
Despite a communication overhead, multi-server computation may be a way 
to partially avoid the server-overload issue discussed above. For example, the 
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Fig. 3. Performance analysis on Yahalom-Lowe protocol with 23 roles 


verification of the Helios protocol (Dishonest revote W) on 3 servers (using 
resp. 10, 20 and 40 cores) resulted in a running time of 18m 14s, while the 
same verification took 51m 49s on a 70-core server (also launched remotely via 
SSH). 
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Abstract. We present a new safety hardware model checker SimpleCAR 
that serves as a reference implementation for evaluating Complemen- 
tary Approximate Reachability (CAR), a new SAT-based model check- 
ing framework inspired by classical reachability analysis. The tool gives 
a “bottom-line” performance measure for comparing future extensions 
to the framework. We demonstrate the performance of SimpleCAR on 
challenging benchmarks from the Hardware Model Checking Competi- 
tion. Our experiments indicate that SimpleCAR is particularly suited for 
unsafety checking, or bug-finding; it is able to solve 7 unsafe instances 
within 1h that are not solvable by any other state-of-the-art techniques, 
including BMC and IC3/PDR, within 8h. We also identify a bug (reports 
safe instead of unsafe) and 48 counterexample generation errors in the 
tools compared in our analysis. 


1 Introduction 


Model checking techniques are widely used in proving design correctness, and 
have received unprecedented attention in the hardware design community [9,16]. 
Given a system model M and a property P, model checking proves whether or 
not P holds for M. A model checking algorithm exhaustively checks all behav- 
iors of M, and returns a counterexample as evidence if any behavior violates the 
property P. The counterexample gives the execution of the system that leads to 
property failure, i.e., a bug. Particularly, if P is a safety property, model checking 
reduces to reachability analysis, and the provided counterexample has a finite 
length. Popular safety checking techniques include Bounded Model Checking 
(BMC) [10], Interpolation Model Checking (IMC) [21], and IC3/PDR [12,14]. It 
is well known that there is no “universal” algorithm in model checking; different 
algorithms perform differently on different problem instances [7]. BMC outper- 
forms IMC on checking unsafe instances, while IC3/PDR can solve instances that 
BMC cannot and vice-versa. [19]. Therefore, BMC and IC3/PDR are the most 
popular algorithms in the portfolio for unsafety checking, or bug-finding. 
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Complementary Approximate Reachability (CAR) [19] is a SAT-based model 
checking framework for reachability analysis. Contrary to reachability analysis 
via IC3/PDR, CAR maintains two sequences of over- and under- approximate 
reachable state-sets. The over-approximate sequence is used for safety check- 
ing, and the under-approximate sequence for unsafety checking. CAR does not 
require the over-approximate sequence to be monotone, unlike IC3/PDR. Both 
forward (Forward-CAR) and backward (Backward-CAR) reachability analysis are 
permissible in the CAR framework. Preliminary results show that Forward-CAR 
complements IC3/PDR on safe instances [19]. 

We present, SimpleCAR, a tool specifically developed for evaluating and 
extending the CAR framework. The new tool is a complete rewrite of CARChecker 
[19] with several improvements and added capabilities. SimpleCAR has a lighter 
and cleaner implementation than CARChecker. Several heuristics that aid 
Forward-CAR to complement IC3/PDR are integrated in CARChecker. Although 
useful, these heuristics make it difficult to understand and extend the core func- 
tionalities of CAR. Like IC3/PDR, the performance of CAR varies significantly 
by using heuristics [17]. Therefore, it is necessary to provide a basic implemen- 
tation of CAR (without code-bloating heuristics) that serves as a “bottom-line” 
performance measure for all extensions in the future. To that end, SimpleCAR 
differs from CARChecker in the following aspects: 


— Eliminates all heuristics integrated in CARChecker except a configuration 
option to enable a 1C3/PDR-like clause “propagation” heuristic. 

— Uses UNSAT cores from the SAT solver directly instead of the expensive 
minimal UNSAT core (MUC) computation in CARChecker. 

— Poses incremental queries to the SAT solver using assumptions; 

— While CARChecker contributes to safety checking [19], SimpleCAR shows a 
clear advantage on unsafety checking. 


We apply SimpleCAR to 748 benchmarks from the Hardware Model Checking 
Competition (HWMCC) 2015 [2] and 2017 [3], and compare its performance to 
reachability analysis algorithms (BMC, IMC, 4 x IC3/PDR, Avy [22], Quip [18]) in 
state-of-the-art model checking tools (ABC, nuXmv, IIMC, IC3Ref). Our extensive 
experiments reveal that Backward-CAR is particularly suited for unsafety check- 
ing: it can solve 8 instances within a 1-h time limit, and 7 instances within a 
8-h time limit not solvable by BMC and IC3/PDR. We conclude that, along with 
BMC and IC3/PDR, CAR is an important candidate in the portfolio of unsafety 
checking algorithms, and SimpleCAR provides an easy and efficient way to evalu- 
ate, experiment with, and add enhancements to the CAR framework. We identify 
1 major bug and 48 errors in counterexample generation in our evaluated tool 
set; all have been reported to the tool developers. 


2 Algorithms and Implementation 


We present a very high-level overview of the CAR framework (refer [19] for 
details). CAR is a SAT-based framework for reachability analysis. It maintains 
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two over- and under- approximate reachable state sequences for safety and 
unsafety checking, respectively. CAR can be symmetrically implemented either in 
the forward (Forward-CAR) or backward (Backward-CAR) mode. In the forward 
mode, the F-sequence (Fo, Fi,..., Fi) is the over-approximated sequence, while 
the B-sequence (Bo, B1,...,B;) is under-approximated. The roles of the F- and 
B- sequence are reversed in the backward mode. We focus here on the backward 
mode of CAR, or Backward-CAR (refer [19] for Forward-CAR) 


2.1 High-Level Description of Backward-CAR 


A frame F; in the F-sequence Table 1. Sequences in Backward-CAR. 
denotes the set of states that F-sequence B-sequence 
are reachable from the initial f a (over) > 
eh eas TE nit o = o=7 
states (I) in i steps. Similarly, Constant |F CRO) Bisa RB 
a frame B; in the B-sequence Safety check |- Ji- Bist C Uo<;j<: Bi 


denotes the set of states that Unsafety check]3i- Fi; N =P 4 0|- 
can reach the bad states (~P) in i steps. Let R(F;) represent the set of successor 
states of F;, and R-1(B;) represent the set of predecessor states of B;. Table 1 
shows the constraints on the sequences and their usage in Backward-CAR for 
safety and unsafety checking. 

Let. SCF) Se ee 
and S(B) = U Bi. Algo- Alg. 1. High-level description of Backward CAR 
rithm 1 gives a descrip- t: Fo = 1, Bo = >P, k = 0; 
tion of Backward-CAR. 2 while true do 


The B-sequence is exten- 3 While S(B) \ R(S(F)) #0 do 
ded exactly once in ever = update F- and B- sequences. 
j ; y 5: if 3i - F; =P #9 then return unsafe; 
iteration of the loop in ns i à D al 
lines 2-8, but the F- ~ oe ion - ues ae ); 
sequence may be extended ` Se En S OLE i 

8: k=k+1 and Bk = >P; 


multiple times in each 
loop iteration in lines 3-5. 
As a result, CAR normally returns counterexamples with longer depth compared 
to the length of the B-sequence. Due to this inherent feature of the framework, 
CAR is able to complement BMC and IC3/PDR on unsafety checking. 


2.2 Tool Implementation 


SimpleCAR is publicly available [5,6] under the GNU GPLv3 license. The tool 
implementation is as follows: 


— Language: C++11 compilable under gcc 4.4.7 or above. 

— Input: Hardware circuit models expressed as and-inverter graphs in the aiger 
1.9 format [11] containing a single safety property. 

— Output: “1” (unsafe) to report the system violates the property, or “0” (safe) 
to confirm that the system satisfies the property. A counterexample in the 
aiger format is generated if run with the -e configuration flag. 
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— Algorithms: Forward-CAR and Backward-CAR with and without the propa- 
gation heuristic (enabled using the -p configuration flag). 

— External Tools: Glucose 3.0 [8] (based on MiniSAT [15]) is used as the 
underlying SAT solver. Aiger tools [1] are used for parsing the input aiger 
files to extract the model and property information, and error checking. 

— Differences with CARChecker [19]: The Minimal Unsat Core (MUC) and 
Partial Assignment (PA) techniques are not utilized in SimpleCAR, which 
allows the implementation to harness the power of incremental SAT solving. 


3 Experimental Analysis 


3.1 Strategies 


Tools. We consider five model checking tools in our evaluation: ABC 1.01 [13], 
IIMC 2.0, Simplic3 [17] (IC3 algorithms used by nuXmv for finite-state systems”), 
IC3Ref [4], CARChecker [19], and SimpleCAR. For ABC, we evaluate BMC (bmc2), 
IMC (int), and PDR (pdr). There are three different versions of BMC in ABC: 
bmc, bmc2, and bmc3. We choose bmc2 based on our preliminary analysis since 
it outperforms other versions. Simplic3 proposes different configuration options 
for IC3. We use the three best candidate configurations for IC3 reported in [17], 
and the Avy algorithm [22] in Simplic3. We consider CARChecker as the original 
implementation of the CAR framework and use it as a reference implementation 
for SimpleCAR. A summary of the tools and their arguments used for exper- 
iments is shown in Table 2. Overall, we consider four categories of algorithms 
implemented in the tools: BMC, IMC, IC3/PDR, and CAR. 


Benchmarks. We evaluate all tools against 748 benchmarks in the aiger format 
[11] from the SINGLE safety property track of the HWMCC in 2015 and 2017. 


Error Checking. We check correctness of results from the tools in two ways: 


1. We use the aigsim [1] tool to check whether the counterexample generated 
for unsafe instances is a real counterexample by simulation. 

2. For inconsistent results (safe and unsafe for the same benchmark by at least 
two different tools) we attempt to simulate the unsafe counterexample, and 
if successful, report an error for the tool that returns safe (surprisingly, we 
do not encounter cases when the simulation check fails). 


Platform. Experiments were performed on Rice University’s DavinCI cluster, 
which comprises of 192 nodes running at 2.83 GHz, 48 GB of memory and running 
RedHat 6.0. We set the memory limit to 8 GB with a wall-time limit of an hour. 
Each model checking run has exclusive access to a node. A time penalty of one 
hour is set for benchmarks that cannot be solved within the time/memory limits. 


1 We use version 2.0 available at https: //ryanmb.bitbucket.io/truss/ — similar to the 
version available at https://github.com/mgudemann/iime with addition of Quip [18]. 
? Personal communication with Alberto Griggio. 
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Table 2. Tools and algorithms (with category) evaluated in the experiments. 


Tool Algorithm Configuration Flags 
BMC (abc-bmc) -c ‘bmc2’ 
ABC MC (abc-int) -c ‘int’ 
PDR (abc-pdr) -c ‘pdr’ 
IMC C3 (iimc-ic3) -t ic3 --ic3_stats --print_cex --cex_aiger 
Quip [18] (iimc-quip) -t quip --quip-stats --print_cex --cex_aiger 
IC3Ref C3 (ic3-ref) -b 
1C3/ ‘ ` -s minisat -m 1 -u4-I10-01-ci-pi-d2 
PDR C3 (simplic3-best1) be eer. 
C3 (simplic3-b 2 -s minisat -m 1 -u 4 -I 1 -D 0 -g 1 -X 0 -0 1 
es (simplic3-best2) -c 0 -p 1 -d 2 -G 1 -P 1 -A 100 
p C3 (simplic3-best3) -s minisat -m 1 -u 4 -I 0 -0 1 -c Ọ -p 1 -d 2 
7 -G 1 -P 1 -A 100 -a aic3 
Avy [22] (simplic3-avy) -a avy 
CARChecker Forward CAR* (carchk-f) -f 
Backward CAR* (carchk-b) -b 
Forward CAR’ (simpcar-f) -f -e 
CAR ; - 
SimpleCAR Backward CAR' (simpcar-b) -b -e 
Forward CAR! (simpcar-fp) -f -p -e 
Backward CAR! (simpcar-bp) -b -p -e 


* 


t no heuristics 
with heuristic for PDR-like clause propagation 


3.2 Results 


with heuristics for minimal unsat core (MUC) [20], partial assignment [23], and propagation. 


Error Report. We identify one bug in simplic3-best3: reports safe instead of 
unsafe, and 48 errors with respect to counterexample generation in iimc-quip 
algorithm (26) and all algorithms in the Simplic3 tool (22). At the time of writing, 
the bug report sent to the developers of Simplic3 has been confirmed. In our 
analysis, we assume the results from these tools to be correct. 


Coarse Analysis. We focus our analysis 
to unsafety checking. Figure 1 shows the 
total number of unsafe benchmarks solved 
by each category (assuming portfolio-run 
of all algorithms in a category). CAR 
complements BMC and IC3/PDR by 
solving 128 benchmarks of which 8 
are not solved by any other category. 
Although CAR solves the least amount 
of total benchmarks, the count of the 
uniquely solved benchmarks is compara- 
ble to other categories. When the wall- 
time limit (memory limit does not change) 
is increased to 8h, BMC and IC3/PDR can 
only solve one of the 8 uniquely solved 


YZZ solved [ uniquely solved 
E 
9 
z 150 
p= 
z 8 
cA 
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3 
2 | R 
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Fig. 1. Number of benchmarks solved 


by each algorithm category (run as a 
portfolio). Uniquely solved benchmarks 
are not solved by any other category. 
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Fig. 2. Number of benchmarks solved by every algorithm in a category. Distinctly 
solved benchmarks by an algorithm are not solved by any algorithm in other categories. 
The set union of distinctly solved benchmarks for all algorithms in a category equals 
the count of uniquely solved for that category in Fig. 1. 


benchmarks by CAR. The analysis supports our claim that CAR complements 
BMC/IC3/PDR on unsafety checking. 


Granular Analysis. Figure2 shows how each algorithm in the IC3/PDR 
(Fig. 2a) and CAR (Fig. 2b) categories performs on the benchmarks. simpcar-bp 
distinctly solves all 8 benchmarks uniquely solved by the CAR cate- 
gory (Fig. 1), while no single IC3/PDR algorithm distinctly solves all 
uniquely solved benchmarks in the IC3/PDR category. In fact, a portfo- 
lio including at least abc-pdr, simplic3-best1, and simplic3-best2 solves all 
8 instances uniquely solved by the IC3/PDR category. It is important to note 
that SimpleCAR is a very basic implementation of the CAR framework compared 
to the highly optimized implementations of IC3/PDR in other tools. Even then 
simpcar-b outperforms four IC3/PDR implementations. Our results show 
that Backward-CAR is a favorable algorithm for unsafety checking. 


Analysis Conclusions. Backward-CAR presents a more promising research 
direction than Forward-CAR for unsafety checking. We conjecture that the per- 
formance of Forward- and Backward- CAR varies with the structure of the aiger 
model. Heuristics and performance-gain present a trade-off. simpcar-bp has a 
better performance compared to the heuristic-heavy carchk-b. On the other 
hand, simpcar-bp solves the most unsafe benchmarks in the CAR category, 
however, adding the “propagation” heuristic effects its performance: there are 
several benchmarks solved by simpcar-b but not by simpcar-bp. 


4 Summary 


We present SimpleCAR, a safety model checker based on the CAR framework for 
reachability analysis. Our tool is a lightweight and extensible implementation 
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of CAR with comparable performance to other state-of-the-art tool implementa- 
tions of highly-optimized unsafety checking algorithms, and complements exist- 
ing algorithm portfolios. Our empirical evaluation reveals that adding heuristics 
does not always improve performance. We conclude that Backward-CAR is a more 
promising research direction than Forward-CAR for unsafety checking, and our 
tool serves as the “bottom-line” for all future extensions to the CAR framework. 
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Abstract. In this paper, we introduce StringFuzz: a modular SMT- 
LIB problem instance transformer and generator for string solvers. We 
supply a repository of instances generated by StringFuzz in SMT-LIB 
2.0/2.5 format. We systematically compare Z3str3, CVC4, Z3str2, and 
Norn on groups of such instances, and identify those that are particularly 
challenging for some solvers. We briefly explain our observations and 
show how StringFuzz helped discover causes of performance degradations 
in Z3str3. 


1 Introduction 


In recent years, many algorithms for solving string constraints have been devel- 
oped and implemented in SMT solvers such as Norn [6], CVC4 [12], and Z3 
(e.g., Z3str2 [13] and Z3str3 [7]). To validate and benchmark these solvers, their 
developers have relied on hand-crafted input suites [1,4,5] or real-world examples 
from a limited set of industrial applications [2,11]. These test suites have helped 
developers identify implementation defects and develop more sophisticated solv- 
ing heuristics. Unfortunately, as more features are added to solvers, these bench- 
marks often remain stagnant, leaving increasing functionality untested. As such, 
there is an acute need for a more robust, inexpensive, and automatic way of 
generating benchmarks to test the correctness and performance of SMT solvers. 

Fuzzing has been used to test all kinds of software including SAT solvers 
[10]. Inspired by the utility of fuzzers, we introduce StringFuzz and describe its 
value as an exploratory testing tool. We demonstrate its efficacy by present- 
ing limitations it helped discover in leading string solvers. To the best of our 
knowledge, StringFuzz is the only tool aimed at automatic generation of string 
constraints. StringFuzz can be used to mutate or transform existing benchmarks, 
as well as randomly generate structured instances. These instances can be scaled 
with respect to a variety of parameters, e.g., length of string constants, depth of 
concatenations (concats) and regular expressions (regexes), number of variables, 
number of length constraints, and many more. 


© The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 45-51, 2018. 
https: //doi.org/10.1007/978-3-319-96142-2_6 
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Contributions 


1. | The StringFuzz tool: In Sect.2, we describe a modular fuzzer that can 
transform and generate SMT-LIB 2.0/2.5 string and regex instances. Scaling 
inputs (e.g., long string constants, deep concatenations) are particularly use- 
ful in identifying asymptotic behaviors in solvers, and StringFuzz has many 
options to generate them. We briefly document StringFuzz’s components and 
modular architecture. We provide example use cases to demonstrate its utility 
as an exploratory solver testing tool. 

2. A repository of SMT-LIB 2.0/2.5 instances: We present a reposi- 
tory of SMT-LIB 2.0/2.5 string and regex instance suites that we generated 
using StringFuzz in Sect.3. This repository consists of two categories: one 
with new instances generated by StringFuzz (generated); and another with 
transformed instances generated from a small suite of industrial benchmarks 
(transformed). 

3. Experimental Results and Analysis: We compare the performance of 
Z3str3, CVC4, Z3str2, and Norn on the StringFuzz suites Concats-Balanced, 
Concats-Big, Concats-Extracts-Small, and Different-Prefix in Sect.4. We 
highlight these suites because they make some solvers perform poorly, but 
not others. We analyze our experimental results, and pinpoint algorithmic 
limitations in Z3str3 that cause poor performance. 


2 StringFuzz 


Implementation and Architecture. StringFuzz is implemented as a Python 
package, and comes with several executables to generate, transform, and analyze 
SMT-LIB 2.0/2.5 string and regex instances. Its components are implemented as 
UNIX “filters” to enable easy integration with other tools (including themselves). 
For example, the outputs of generators can be piped into transformers, and 
transformers can be chained to produce a stream of tuned inputs to a solver. 
StringFuzz is composed of the following tools: 


stringfuzzg 
This tool generates SMT-LIB instances. It supports several generators and 
options that specify its output. Details can be found in Table la. 

stringfuzzx 
This tool transforms SMT-LIB instances. It supports several transform- 
ers and options that specify its output and input, which are explained in 
Table 1b. Note that transformers Translate and Reverse also preserve satis- 
fiability under certain conditions. 

stringstats 
This tool takes an SMT-LIB instance as input and outputs its properties: the 
number of variables/literals, the max/median syntactic depth of expressions, 
the max/median literal length, etc. 


1 All source code, problem suites, and supplementary material referenced in this paper 
are available at the StringFuzz website [3]. 
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Table 1. StringFuzz built-in (a) generators and (b) transformers. 


(a) stringfuzzg built-in generators. 


Name Generates instances that have ... 

Concats Long concats and optional random extracts. 

Lengths Many variables (and their concats) with length constraints. 
Overlaps An expression of the form A.X = X.B. 

Equality An equality among concats, each with variables or constants. 
Regex Regexes of varying complexity. 


Random-Text Totally random ASCII text. 
Random-AST Random string and regex constraints. 


(b) stringfuzzx built-in transformers. 


Name The transformer ... 

Fuzz Replaces literals and operators with similar ones. 

Graft Randomly swaps non-leaf nodes with leaf nodes. 
Multiply | Multiplies integers and repeats strings by N. 

Nop Does nothing (can translate between SMT-LIB 2.0/2.5). 
Reverse?” Reverses all string literals and concat arguments. 
Rotate Rotates compatible nodes in syntax tree. 


Translate” Permutes the alphabet. 
Unprintable Replaces characters in literals with unprintable ones. 


*Can guarantee satisfiable output instances from satisfiable input instances [3]. 
>Can guarantee input and output instances will be equisatisfiable [3]. 


We organized StringFuzz to be easily extended. To show this, we note that 
while the whole project contains 3,183 lines of code, it takes an average of 45 
lines of code to create a transformer. StringFuzz can be installed from source, 
or from the Python PIP package repository. 


Regex Generating Capabilities. StringFuzz can generate and transform 
instances with regex constraints. For example, the command “stringfuzzg 
regex -r 2 -d 1 -t 1 -M 3 -X 10” produces this instance: 


(set-logic QF_S) 

(declare-fun varO () String) 

(assert (str.in.re varO (re.+ (str.to.re "R5")))) 
(assert (str.in.re varO (re.+ (str.to.re "!PC")))) 
(assert (<= 3 (str.len var0))) 

(assert (<= (str.len var0) 10)) 

(check-sat) 


Each instance is a set of one or more regex constraints on a single variable, 
with optional maximum and minimum length constraints. Each regex constraint 
is a concatenation (re.++ in SMT-LIB string syntax) of regex terms: 


(re.t++ T1 (re.++ T2... (re.++ Tn-1 Tn ))) 
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and each term Ti is recursively defined as any one of: repetition (re.*), Kleene 
star (re.+), union (re. union), or a character literal. Nested operators are nested 
up to a specified (using the --depth flag) depth of recursion. Terms at depth 
0 are regex constants. Below are 3 example regexes (in regex, not SMT-LIB, 
syntax) of depth 2 that can be produced this way: 


((alb)|(cc)+) (ddd) *) + ((ee) + |(£ff)*) 


Equisatisfiable String Transformations. StringFuzz can also transform 
problem instances. This is done by manipulating parsed syntax trees. By default 
most of the built-in transformers only guarantee well-formedness, however, some 
can even guarantee equisatisfiability. Table 1b lists the built-in transformers and 
notes these guarantees. 


Example Use Case. In Sect. 3 we use StringFuzz to generate benchmark suites 
in a batch mode. We can also use StringFuzz for on-line exploratory debugging. 
For example, the script below repeatedly feeds random StringFuzz instances to 
CVC4 until the solver produces an error: 


while stringfuzzg -r random-ast -m \ 
| tee instance.smt25 | cvc4 --lang smt2.5 --tlimit=5000 --strings-exp; do 
sleep 0 

done 


3 Instance Suites 


In this section, we describe the benchmark suites we generated with String- 
Fuzz, and on which we conducted our experimental evaluation. Table 2a lists 
instances that were generated by stringfuzzg. Table 2b lists instances derived 
from existing seed instances by iteratively applying stringfuzzx. Every trans- 
formed instance is named according to its seed and the transformations it under- 
took. For example, z3-regex-1-fuzz-graft.smt2 was transformed by applying 
Fuzz and then Graft to z3-regex-1.smt2. 

The Amazon category contains 472 instances derived from two seeds supplied 
by our industrial collaborators. The Regex category is seeded by the Z3str2 regex 
test suite [4], which contains 42 instances. Through cumulative transformations 
we expanded the 42 seeds to 7,551 unique instances. Finally, the Sanitizer cat- 
egory is obtained from five industrial e-mail address and IPv4 sanitizers. 


4 Experimental Results and Analysis 


We generated several problem instance suites with StringFuzz that made one 
solver perform poorly, but not others.2 They are Concats-Balanced, Concats- 
Big, Concats-Extracts-Small, and Different-Prefiz. Figure 1 shows the suites that 


? Only the results that made one solver perform poorly and not others are presented, 
but results for all StringFuzz suites are available on the StringFuzz website [3]. 
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Table 2. Repository of 10,258 SMT-LIB 2.0/2.5 instances. 
(a) stringfuzzg-generated instances. 
Name Instances have a ... Quantity 
Concats-{ Small, Big} Right-heavy, deep tree of concats. 120 
Concats-Balanced Balanced, deep tree of concats. 100 
Concats-Extracts-{ Small, Big} Single concat tree, with character extractions. 120 
Lengths-{ Long, Short} Single, large length constraint on a variable. 200 
Lengths-Concats Tree of fixed-length concats of variables. 100 
Overlaps-{ Small, Big} Formula of the form A.X = X.B. 80 
Regex-{ Small, Big} Complex regex membership test. 120 
Many-Regexes Multiple random regex membership tests. 40 
Regex-Deep Regex membership test with many nested operators. 45 
Regex-Pair Test for membership in one regex, but not another. 40 
Regex-Lengths Regex membership test, and a length constraint. 40 
Different-Prefix Equality of two deep concats with different prefixes. 60 


(b) stringfuzzx-generated instances. 


Name Seed Quantity 
Amazon Two industrial regex membership instances. 472 
Regex Z3str2 regular expression test suite. 7,551 
Sanitizer Five e-mail and IPv4 sanitiser examples. 1,170 
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Fig. 1. Instances hard for CVC4 


60 


were uniquely difficult for CVC4. Figure2 shows the suites that were uniquely 
difficult for Z3str3. All experiments were conducted in series, each with a timeout 
of 15s, on an Ubuntu Linux 16.04 computer with 32 GB of RAM and an Intel® 


Core™ i7-6700 CPU (3.40 GHz). 


Usefulness to Z3str3: A Case Study. StringFuzz’s ability to produce scaling 
instances helped uncover several implementation issues and performance limita- 
tions in Z3str3. Scaling inputs can reveal issues that would normally be out of 
scope for unit tests or industrial benchmarks. Three different performance and 
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Fig. 2. Instances hard for Z3str3 


implementation bugs were identified and fixed in Z3str3 as a result of testing 
with the StringFuzz scaling suites Lengths-Long and Concats-Big. 

StringFuzz also helped identify a number of performance-related issues and 
opportunities for new heuristics in Z3str3. For example, by examining Z3str3’s 
execution traces on the instances in the Concats-Big suite we discovered a poten- 
tial new heuristic. In particular, Z3str3 does not make full use of the solving con- 
text (e.g. some terms are empty strings) to simplify the concatenations of a long 
list of string terms before trying to reason about the equivalences among sub- 
terms. Z3str3 therefore introduces a large number of unnecessary intermediate 
variables and propagations. 


5 Related Work 


Many solver developers create their own test suites to validate their solvers [1, 
4,5]. Several popular instance suites are also publicly available for solver testing 
and benchmarking, such as the Kaluza [2] and Kausler [11] suites. There are 
likewise several fuzzers and instance generators currently available, but none of 
them can generate or transform string and regex instances. For example, the 
FuzzSMT [9] tool generates SMT-LIB instances with bit-vectors and arrays, 
but does not support strings or regexes. The SMTpp [8] tool pre-processes and 
simplifies instances, but does not generate new ones or fuzz existing ones. 
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Abstract. Information about the memory locations accessed by a pro- 
gram is, for instance, required for program parallelisation and program 
verification. Existing inference techniques for this information provide 
only partial solutions for the important class of array-manipulating pro- 
grams. In this paper, we present a static analysis that infers the memory 
footprint of an array program in terms of permission pre- and postcon- 
ditions as used, for example, in separation logic. This formulation allows 
our analysis to handle concurrent programs and produces specifications 
that can be used by verification tools. Our analysis expresses the permis- 
sions required by a loop via maximum expressions over the individual 
loop iterations. These maximum expressions are then solved by a novel 
maximum elimination algorithm, in the spirit of quantifier elimination. 
Our approach is sound and is implemented; an evaluation on existing 
benchmarks for memory safety of array programs demonstrates accurate 
results, even for programs with complex access patterns and nested loops. 


1 Introduction 


Information about the memory locations accessed by a program is crucial for 
many applications such as static data race detection [45], code optimisation 
[16,26,33], program parallelisation [5,17], and program verification [23, 30, 38,39]. 
The problem of inferring this information statically has been addressed by a 
variety of static analyses, e.g., [9,42]. However, prior works provide only partial 
solutions for the important class of array-manipulating programs for at least 
one of the following reasons. (1) They approximate the entire array as one single 
memory location [4] which leads to imprecise results; (2) they do not produce 
specifications, which are useful for several important applications such as human 
inspection, test case generation, and especially deductive program verification; 
(3) they are limited to sequential programs. 

In this paper, we present a novel analysis for array programs that addresses 
these shortcomings. Our analysis employs the notion of access permission from 
separation logic and similar program logics [40,43]. These logics associate a per- 
mission with each memory location and enforce that a program part accesses a 
© The Author(s) 2018 
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location only if it holds the associated permission. In this setting, determining 
the accessed locations means to infer a sufficient precondition that specifies the 
permissions required by a program part. 

Phrasing the problem as one of permission inference allows us to address 
the three problems mentioned above. (1) We distinguish different array elements 
by tracking the permission for each element separately. (2) Our analysis infers 
pre- and postconditions for both methods and loops and emits them in a form 
that can be used by verification tools. The inferred specifications can easily be 
complemented with permission specifications for non-array data structures and 
with functional specifications. (3) We support concurrency in three important 
ways. First, our analysis is sound for concurrent program executions because 
permissions guarantee that program executions are data race free and reduce 
thread interactions to specific points in the program such as forking or joining 
a thread, or acquiring or releasing a lock. Second, we develop our analysis for a 
programming language with primitives that represent the ownership transfer that 
happens at these thread interaction points. These primitives, inhale and exhale 
[31,38], express that a thread obtains permissions (for instance, by acquiring a 
lock) or loses permissions (for instance, by passing them to another thread along 
with a message) and can thereby represent a wide range of thread interactions 
in a uniform way [32,44]. Third, our analysis distinguishes read and write access 
and, thus, ensures exclusive writes while permitting concurrent read accesses. 
As is standard, we employ fractional permissions |6] for this purpose; a full 
permission is required to write to a location, but any positive fraction permits 
read access. 


Approach. Our analysis reduces the problem of reasoning about permissions for 
array elements to reasoning about numerical values for permission fractions. To 
achieve this, we represent permission fractions for all array elements galq:] using 
a single numerical expression t(qa, qi) parameterised by qa and q;. For instance, 
the conditional term (qa=a A^ qi=j ? 1:0) represents full permission (denoted by 
1) for array element a[j] and no permission for all other array elements. 

Our analysis employs a precise backwards analysis for loop-free code: a varia- 
tion on the standard notion of weakest preconditions. We apply this analysis to 
loop bodies to obtain a permission precondition for a single loop iteration. Per 
array element, the whole loop requires the maximum fraction over all loop iter- 
ations, adjusted by permissions gained and lost during loop execution. Rather 
than computing permissions via a fixpoint iteration (for which a precise widen- 
ing operator is difficult to design), we express them as a maximum over the 
variables changed by the loop execution. We then use inferred numerical invari- 
ants on these variables and a novel maximum elimination algorithm to infer a 
specification for the entire loop. Permission postconditions are obtained analo- 
gously. 

For the method copyEven in Fig. 1, the analysis determines that the permission 
amount required by a single loop iteration is (j%2=0?(qa=a A qi=j?rd:0):(qa=a ^ 
qi=j?1:0)). The symbol rd represents a fractional read permission. Using a suit- 
able integer invariant for the loop counter j, we obtain the loop precondition 
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method copyEven(a: Int[]) { method parCopyEven(a: Int[]) { 
var j, v: Int := 0; var j: Int := 0; 
while(j < ponema )) £ while(j < length(a)/2) { 
if (j % 2 == 0) {v :=al[j] } exhale(a, 2*j, 1/2); 
else o al r= v }; exhale(a, 2*j+1, 1); 
j :=j+ i j :=j+1 
} 
} } 
Fig. 1. Program copyEven. Fig. 2. Program parCopyEven. 
en=n|ax|n-x | e1 +e |e — e | ale] | len(a) | (b? e1 : e2) 


b ::= e10p e2 | e%n=0 | e%nÆ0 | bı A be | bı V b2 | =b 
op € {=,%, <, S, >, 2} 
p = q | rd | pı + p2 | pı — p2 | min(pi, p2) | max(pı, p2) | (b ? pı : p2) 
s ::= skip | v:=e | a1:=a2 | x:=ale] | ale1]:=e2 | exhale(a, e, p) | inhale(a, e, p) 
| (s1;82) | i£ (b) { sı } else { s2 } | while (b) { s } 


Fig. 3. Programming Language. n ranges over integer constants, x over integer vari- 
ables, a over array variables, q over non-negative fractional (permission-typed) con- 
stants. e stands for integer expressions, and b for boolean. Permission expressions p are 
a separate syntactic category. 


MAX5|0<j<1en(a) ((i%2=0 ? (qa=a A qi=j ? rd : 0) : (qa=a A qi=j ? 1 : 0))). Our 
maximum elimination algorithm obtains (qa=a ^ 0<q;<len(a)? (q;:%2=0? rd: 1): 
0). By ranging over all qa and qi, this can be read as read permission for even 
indices and write permission for odd indices within the array a’s bounds. 


Contributions. The contributions of our paper are: 


1. A novel permission inference that uses maximum expressions over parame- 
terised arithmetic expressions to summarise loops (Sects. 3 and 4) 

2. An algorithm for eliminating maximum (and minimum) expressions over an 

unbounded number of cases (Sect. 5) 

An implementation of our analysis, which will be made available as an artifact 

4. An evaluation on benchmark examples from existing papers and competitions, 
demonstrating that we obtain sound, precise, and compact specifications, even 
for challenging array access patterns and parallel loops (Sect. 6) 

5. Proof sketches for the soundness of our permission inference and correctness 
of our maximum elimination algorithm (in the technical report (TR) [15]) 


o 


2 Programming Language 


We define our inference technique over the programming language in Fig. 3. Pro- 
grams operate on integers (expressions e), booleans (expressions b), and one- 
dimensional integer arrays (variables a); a generalisation to other forms of arrays 
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is straightforward and supported by our implementation. Arrays are read and 
updated via the statements x := ale] and aļe] := x; array lookups in expressions 
are not part of the surface syntax, but are used internally by our analysis. Per- 
mission expressions p evaluate to rational numbers; rd, min, and max are for 
internal use. 

A full-fledged programming language contains many statements that affect 
the ownership of memory locations, expressed via permissions [32,44]. For exam- 
ple in a concurrent setting, a fork operation may transfer permissions to the new 
thread, acquiring a lock obtains permission to access certain memory locations, 
and messages may transfer permissions between sender and receiver. Even in 
a sequential setting, the concept is useful: in procedure-modular reasoning, a 
method call transfers permissions from the caller to the callee, and back when 
the callee terminates. Allocation can be represented as obtaining a fresh object 
and then obtaining permission to its locations. 

For the purpose of our permission inference, we can reduce all of these oper- 
ations to two basic statements that directly manipulate the permissions cur- 
rently held [31,38]. An inhale(a,e,p) statement adds the amount p of per- 
mission for the array location ale] to the currently held permissions. Dually, 
an exhale(a,e,p) statement requires that this amount of permission is already 
held, and then removes it. We assume that for any inhale or exhale statements, 
the permission expression p denotes a non-negative fraction. For simplicity, we 
restrict inhale and exhale statements to a single array location, but the exten- 
sion to unboundedly-many locations from the same array is straightforward [37]. 


Semantics. The operational semantics of our language is mostly standard, but 
is instrumented with additional state to track how much permission is held to 
each heap location; a program state therefore consists of a triple of heap H 
(mapping pairs of array identifier and integer index to integer values), a permis- 
sion map P, mapping such pairs to permission amounts, and an environment o 
mapping variables to values (integers or array identifiers). 

The execution of inhale or exhale statements causes modifications to the 
permission map, and all array accesses are guarded with checks that at least 
some permission is held when reading and that full (1) permission is held when 
writing [6]. If these checks (or an exhale statement) fail, the execution terminates 
with a permission failure. Permission amounts greater than 1 indicate invalid 
states that cannot be reached by a program execution. We model run-time errors 
other than permission failures (in particular, out-of-bounds accesses) as stuck 
configurations. 


3 Permission Inference for Loop-Free Code 


Our analysis infers a sufficient permission precondition and a guaranteed permis- 
sion postcondition for each method of a program. Both conditions are mappings 
from array elements to permission amounts. Executing a statement s in a state 
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pre(skip, p) = pre((s1; $2), p) = pre(si, pre(s2, p)) 
pre(x:=e, p) = viele pre(x:=ale], p) = max(plale]/z], aa,e(rd)) 
pre(ale]:=x,p) = max(pla’[e’] > (e =e’ Aa = a' ? 2:4’ [e’})], aa,e(1)) 
pre(exhale(a,e,p’),p) =p+a,e(p’) pre(inhale(a,e,p’),p) = max(0,p — da,e(p’)) 
pre(if(b) { sı } else { s2 },p) = (b? pre(s1, p) : pre(s2,p)) 
A(skip, p) = p A((s1; 82), p) = A(si, A(s2,p)) 
A(z:=e, p) = ple/x| A(x:=ale], p) = plale]/z] 
A(ale]:=2, p) = pia’[e’] œ (e =e’ Aa=a'? ax: a'fe'])] 
A(exhale(a, e, p'), p) =p —Qa,e(p’) A(inhale(a,e,p’),p) = p+ aa,e(p’) 
A(it(0) { 21 } else { #2 hp) = (67 Alenp) Alip) 


Fig. 4. The backwards analysis rules for permission preconditions and relative permis- 
sion differences. The notation aa,e(p) is a shorthand for (qa=a ^ qi=e? p: 0) and 
denotes p permission for the array location a[e]. Moreover, p[a’[e’] ++ e] matches all 
array accesses in p and replaces them with the expression obtained from e by substi- 
tuting all occurrences of a’ and e’ with the matched array and index, respectively. The 
cases for inhale statements are slightly simplified; the full rules are given in Fig.6 of 
the TR [15]. 


whose permission map P contains at least the permissions required by a suffi- 
cient permission precondition for s is guaranteed to not result in a permission 
failure. A guaranteed permission postcondition expresses the permissions that 
will at least be held when s terminates (see Sect. A of the TR [15] for formal 
definitions). 

In this section, we define inference rules to compute sufficient permission 
preconditions for loop-free code. For programs which do not add or remove per- 
missions via inhale and exhale statements, the same permissions will still be 
held after executing the code; however, to infer guaranteed permission postcon- 
ditions in the general case, we also infer the difference in permissions between 
the state before and after the execution. We will discuss loops in the next section. 
Non-recursive method calls can be handled by applying our analysis bottom-up 
in the call graph and using inhale and exhale statements to model the permis- 
sion effect of calls. Recursion can be handled similarly to loops, but is omitted 
here. 

We define our permission analysis to track and generate permission expres- 
sions parameterised by two distinguished variables qa and q;; by parameterising 
our expressions in this way, we can use a single expression to represent a permis- 
sion amount for each pair of qa and q; values. 


Preconditions. The permission precondition of a loop-free statement s and a 
postcondition permission p (in which qa and q; potentially occur) is denoted by 
pre(s,p), and is defined in Fig. 4. Most rules are straightforward adaptations of a 
classical weakest-precondition computation. Array lookups require some permis- 
sion to the accessed array location; we use the internal expression rd to denote 
a non-zero permission amount; a post-processing step can later replace rd by 
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a concrete rational. Since downstream code may require further permission for 
this location, represented by the permission expression p, we take the maximum 
of both amounts. Array updates require full permission and need to take alias- 
ing into account. The case for inhale subtracts the inhaled permission amount 
from the permissions required by downstream code; the case for exhale adds the 
permissions to be exhaled. Note that this addition may lead to a required per- 
mission amount exceeding the full permission. This indicates that the statement 
is not feasible, that is, all executions will lead to a permission failure. 

To illustrate our pre definition, let s be the body of the loop in the parCopyEven 
method in Fig. 2. The precondition pre(s,0) = (qa=a A qi=2*j ?1/2:0) + (da=a 
A qi=2xj+1 ? 1:0) expresses that a loop iteration requires a half permission for 
the even elements of array a and full permission for the odd elements. 


Postconditions. The final state of a method execution includes the permissions 
held in the method pre-state, adjusted by the permissions that are inhaled or 
exhaled during the method execution. To perform this adjustment, we compute 
the difference in permissions before and after executing a statement. The rela- 
tive permission difference for a loop-free statement s and a permission expression 
p (in which qa and q; potentially occur) is denoted by A(s,p), and is defined 
backward, analogously to pre in Fig. 4. The second parameter p acts as an accu- 
mulator; the difference in permission is represented by evaluating A(s, 0). 

For a statement s with precondition pre(s,0), we obtain the postcondition 
pre(s,0)+A(s,0). Let s again be the loop body from parCopyEven. Since s contains 
exhale statements, we obtain A(s,0) = 0 — (qa=a A qi=2*j ? 1/2: 0) — (qa=a A 
qi=2*j+1?1:0). Thus, the postcondition pre(s,0) + A(s,0) can be simplified to 
0. This reflects the fact that all required permissions for a single loop iteration 
are lost by the end of its execution. 

Since our A operator performs a backward analysis, our permission post- 
conditions are expressed in terms of the pre-state of the execution of s. To 
obtain classical postconditions, any heap accesses need to refer to the pre-state 
heap, which can be achieved in program logics by using old expressions or log- 
ical variables. Formalizing the postcondition inference as a backward analysis 
simplifies our treatment of loops and has technical advantages over classical 
strongest-postconditions, which introduce existential quantifiers for assignment 
statements. A limitation of our approach is that our postconditions cannot cap- 
ture situations in which a statement obtains permissions to locations for which 
no pre-state expression exists, e.g. allocation of new arrays. Our postconditions 
are sound; to make them precise for such cases, our inference needs to be com- 
bined with an additional forward analysis, which we leave as future work. 


4 Handling Loops via Maximum Expressions 


In this section, we first focus on obtaining a sufficient permission precondition 
for the execution of a loop in isolation (independently of the code after it) and 
then combine the inference for loops with the one for loop-free code described 
above. 
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4.1 Sufficient Permission Preconditions for Loops 


A sufficient permission precondition for a loop guarantees the absence of permis- 
sion failures for a potentially unbounded number of executions of the loop body. 
This concept is different from a loop invariant: we require a precondition for 
all executions of a particular loop, but it need not be inductive. Our technique 
obtains such a loop precondition by projecting a permission precondition for a 
single loop iteration over all possible initial states for the loop executions. 


Exhale-Free Loop Bodies. We consider first the simpler (but common) case 
of a loop that does not contain exhale statements, e.g., does not transfer permis- 
sions to a forked thread. The solution for this case is also sound for loop bodies 
where each exhale is followed by an inhale for the same array location and at 
least the same permission amount, as in the encoding of most method calls. 

Consider a sufficient permission precondition p for the body of a loop 
while (b) { s }. By definition, p will denote sufficient permissions to execute 
s once; the precise locations to which p requires permission depend on the initial 
state of the loop iteration. For example, the sufficient permission precondition for 
the body of the copyEven method in Fig. 1, (4%2=0? (qa=a A qi=ij?rd:0):(qa=a A 
qi=j ?1:0)), requires permissions to different array locations, depending on the 
value of j. To obtain a sufficient permission precondition for the entire loop, we 
leverage an over-approximating loop invariant Z*+ from an off-the-shelf numeri- 
cal analysis (e.g., [13]) to over-approximate all possible values of the numerical 
variables that get assigned in the loop body, here, j. We can then express the 
loop precondition using the pointwise marimum max;)z+,» (p), over the values 
of j that satisfy the condition Z* A b. (The maximum over an empty range is 
defined to be 0.) For the copyEven method, given the invariant 0 < j < len(a), 
the loop precondition is max j9<j<1en(a) (P). 

In general, a permission precondition for a loop body may also depend on 
array values, e.g., if those values are used in branch conditions. To avoid the 
need for an expensive array value analysis, we define both an over- and an under- 
approximation of permission expressions, denoted p and p* (cf. Sect. A.1 of the 
TR [15]), with the guarantees that p < p and pt < p. These approximations 
abstract away array-dependent conditions, and have an impact on precision only 
when array values are used to determine a location to be accessed. For exam- 
ple, a linear array search for a particular value accesses the array only up to 
the (a-priori unknown) point at which the value is found, but our permission 
precondition conservatively requires access to the full array. 


Theorem 1. Let while (b) { s } be an exhale-free loop, let © be the integer 
variables modified by s, and let Z* be a sound over-approximating numerical 
loop invariant (over the integer variables in s). Then maxz\z+,» (pre(s,0)*) is a 
sufficient permission precondition for while (b) { s }. 


Loops with Exhale Statements. For loops that contain exhale statements, 
the approach described above does not always guarantee a sufficient permission 
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precondition. For example, if a loop gives away full permission to the same 
array location in every iteration, our pointwise maximum construction yields a 
precondition requiring the full permission once, as opposed to the unsatisfiable 
precondition (since the loop is guaranteed to cause a permission failure). 

As explained above, our inference is sound if each exhale statement is fol- 
lowed by a corresponding inhale, which can often be checked syntactically. In 
the following, we present another decidable condition that guarantees soundness 
and that can be checked efficiently by an SMT solver. If neither condition holds, 
we preserve soundness by inferring an unsatisfiable precondition; we did not 
encounter any such examples in our evaluation. 

Our soundness condition checks that the maximum of the permissions 
required by two loop iterations is not less than the permissions required by exe- 
cuting the two iterations in sequence. Intuitively, that is the case when neither 
iteration removes permissions that are required by the other iteration. 


Theorem 2 (Soundness Condition for Loop Preconditions). Given a 
loop while (b) { s }, let T be the integer variables modified in s and let U and vu! 
be two fresh sets of variables, one for each of Œ. Then maxziz+ nb (pre(s,0)*) is a 
sufficient permission precondition for while (b) { s } if the following implication 
is valid in all states: 


(T+ Ab)[v/z] A (Zt Ab)[v'/z] A (Vvv) => 


max(pre(s,0)"[v/z], pre(s,0)*[v'/a]) > pre(s, pre(s,0)*[v'/x])*[v/a] 


The additional variables v and v’ are used to model two arbitrary valuations of T; 
we constrain these to represent two initial states allowed by Z* A b and different 
from each other for at least one program variable. We then require that the effect 
of analysing each loop iteration independently and taking the maximum is not 
smaller than the effect of sequentially composing the two loop iterations. 

The theorem requires implicitly that no two different iterations of a loop 
observe exactly the same values for all integer variables. If that could be the 
case, the condition V v 4 v’ would cause us to ignore a potential pair of initial 
states for two different loop iterations. To avoid this problem, we assume that all 
loops satisfy this requirement; it can easily be enforced by adding an additional 
variable as loop iteration counter [21]. 

For the parCopyEven method (Fig.2), the soundness condition holds since, 
due to the v 4 v’ condition, the two terms on the right of the implication 
are equal for all values of q;. We can thus infer a sufficient precondition as 
MAX5|0<j<1en(a)/2 ((qa=a A G=2*j ? 1/2 : 0) + (qa=a A qi=2*j+1 ? 1 : 0)). 


4.2 Permission Inference for Loops 


We can now extend the pre- and postcondition inference from Sect. 3 with loops. 
pre(while (b) { s }, p) must require permissions such that (1) the loop executes 
without permission failure and (2) at least the permissions described by p are held 
when the loop terminates. While the former is provided by the loop precondition 
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as defined in the previous subsection, the latter also depends on the permissions 
gained or lost during the execution of the loop. To characterise these permissions, 
we extend the A operator from Sect.3 to handle loops. 

Under the soundness condition from Theorem 2, we can mimic the approach 
from the previous subsection and use over-approximating invariants to project 
out the permissions lost in a single loop iteration (where A(s,0) is negative) 
to those lost by the entire loop, using a maximum expression. This projection 
conservatively assumes that the permissions lost in a single iteration are lost 
by all iterations whose initial state is allowed by the loop invariant and loop 
condition. This approach is a sound over-approximation of the permissions lost. 

However, for the permissions gained by a loop iteration (where A(s, 0) is pos- 
itive), this approach would be unsound because the over-approximation includes 
iterations that may not actually happen and, thus, permissions that are not 
actually gained. For this reason, our technique handles gained permissions via 
an under-approximate’ numerical loop invariant Z~ (e.g., [35]) and thus projects 
the gained permissions only over iterations that will surely happen. 

This approach is reflected in the definition of our A operator below via d, 
which represents the permissions possibly lost or definitely gained over all iter- 
ations of the loop. In the former case, we have A(s,0) < 0 and, thus, the first 
summand is 0 and the computation based on the over-approximate invariant 
applies (note that the negated maximum of negated values is the minimum; we 
take the minimum over negative values). In the latter case (A(s,0) > 0), the 
second summand is 0 and the computation based on the under-approximate 
invariant applies (we take the maximum over positive values). 


A(while (b) { s },p) = (b? d+ p: p), where: 
d= max max(0, A(s,0))+ max max(0, — A(s,0))* 
ZİTA ^ 


|Z- ^b z| 


‘= ma ax(0,p)}— max max(0,—p)* 
4 Z|T-Anb (0,7) BIZ+ Ab (0;=p) 


T denotes again the integer variables modified in s. The role of p’ is to carry over 
the permissions p that are gained or lost by the code following the loop, taking 
into account any state changes performed by the loop. Intuitively, the maximum 
expressions replace the variables in p with expressions that do not depend 
on these variables but nonetheless reflect properties of their values right after 
the execution of the loop. For permissions gained, these properties are based 
on the under-approximate loop invariant to ensure that they hold for any possi- 
ble loop execution. For permissions lost, we use the over-approximate invariant. 
For the loop in parCopyEven we use the invariant 0 < j < len(a)/2 to obtain 
d = —maXjjo<j<1en(a)/2 ((qa=a A qi=2*j ? 1/2: 0) + (qa=a A qi=2xj+1 ? 1 : 0)). 
Since there are no statements following the loop, p and therefore p’ are 0. 
Using the same d term, we can now define the general case of pre for loops, 
combining (1) the loop precondition and (2) the permissions required by the code 
after the loop, adjusted by the permissions gained or lost during loop execution: 


1 An under-approximate loop invariant must be true only for states that will actually 
be encountered when executing the loop. 
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pre(while (b) { s },p) = (b? max( max pre(s,0)', max (p*)—d):p) 


z|Z+ Ab BIZ+Anb 


Similarly to p’ in the rule for A, the expression maxz\z+,~» (p?) conservatively 
over-approximates the permissions required to execute the code after the loop. 
For method parCopyEven, we obtain a sufficient precondition that is the negation 
of the A. Consequently, the postcondition is 0. 


Soundness. Our pre and A definitions yield a sound method for computing 
sufficient permission preconditions and guaranteed postconditions: 


Theorem 3 (Soundness of Permission Inference). For any statement s, if 
every while loop in s either is exhale-free or satisfies the condition of Theorem 2 
then pre(s,0) is a sufficient permission precondition for s, and pre(s, 0) +A(s, 0) 
is a corresponding guaranteed permission postcondition. 


Our inference expresses pre and postconditions using a maximum operator 
over an unbounded set of values. However, this operator is not supported by SMT 
solvers. To be able to use the inferred conditions for SMT-based verification, we 
provide an algorithm for eliminating these operators, as we discuss next. 


5 A Maximum Elimination Algorithm 


We now present a new algorithm for replacing maximum expressions over an 
unbounded set of values (called pointwise maximum expressions in the follow- 
ing) with equivalent expressions containing no pointwise maximum expressions. 
Note that, technically our algorithm computes solutions to max,)4,p>0(p) since 
some optimisations exploit the fact that the permission expressions our analysis 
generates always denote non-negative values. 


5.1 Background: Quantifier Elimination 


Our algorithm builds upon ideas from Cooper’s classic quantifier elimination 
algorithm [11] which, given a formula x.b (where b is a quantifier-free Presburger 
formula), computes an equivalent quantifier-free formula b’. Below, we give a brief 
summary of Cooper’s approach. 

The problem is first reduced via boolean and arithmetic manipulations to a 
formula 4z.b in which x occurs at most once per literal and with no coefficient. 
The key idea is then to reduce Jz.b to a disjunction of two cases: (1) there 
is a smallest value of x making b true, or (2) b is true for arbitrarily small 
values of x. 

In case (1), one computes a finite set of expressions S' (the b; in [11]) guar- 
anteed to include the smallest value of x. For each (in/dis-)equality literal con- 
taining x in b, one collects a boundary expression e which denotes a value for x 
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making the literal true, while the value e — 1 would make it false. For example, 
for the literal y < x one generates the expression y + 1. If there are no (non-) 
divisibility constraints in b, by definition, S will include the smallest value of x 
making b true. To account for (non-)divisibility constraints such as 7%2=0, the 
lowest-common-multiple ô of the divisors (and 1) is returned along with S; the 
guarantee is then that the smallest value of x making b true will be e + d for 
some e € S and d € [0,6 — 1]. We use (0),,,a12) to denote the function handling 
this computation. Then, Jx.b can be reduced to Veg aejo,s—1 ble + 4/2], where 
(S, 6) ae (smaa): 

In case (2), one can observe that the (in/dis-)equality literals in b will flip 
value at finitely many values of x, and so for sufficiently small values of x, each 
(in/dis-)equality literal in b will have a constant value (e.g., y > x will be true). By 
replacing these literals with these constant values, one obtains a new expression b’ 
equal to b for small enough x, and which depends on g only via (non-)divisibility 
constraints. The value of b’ will therefore actually be determined by x mod 6, 
where 6 is the lowest-common-multiple of the (non-)divisibility constraints. We 
use (0) a(z) to denote the function handling this computation. Then, Jx.b can 
be reduced to V gejo,6—1] V [4/2], where (b',6) = (b) s(x): 

In principle, the maximum of a function y = max, f(x) can be defined using 
two first-order quantifiers Yx. f(x) < y and Ja.f(a) = y. One might therefore 
be tempted to tackle our maximum elimination problem using quantifier elim- 
ination directly. We explored this possibility and found two serious drawbacks. 
First, the resulting formula does not yield a permission-typed expression that 
we can plug back into our analysis. Second, the resulting formulas are extremely 
large (e.g., for the copyEven example it yields several pages of specifications), and 
hard to simplify since relevant information is often spread across many terms due 
to the two separate quantifiers. Our maximum elimination algorithm addresses 
these drawbacks by natively working with arithmetic expression, while mim- 
icking the basic ideas of Cooper’s algorithm and incorporating domain-specific 
optimisations. 


5.2 Maximum Elimination 


The first step is to reduce the problem of eliminating general max,,), (p) terms to 
those in which b and p come from a simpler restricted grammar. These simple per- 
mission expressions p do not contain general conditional expressions (0! ? pı : p2), 
but instead only those of the form (b’?r:0) (where r is a constant or rd). Further- 
more, simple permission expressions only contain subtractions of the form p — 
(b'?r:0). This is achieved in a precursory rewriting of the input expression by, for 
instance, distributing pointwise maxima over conditional expressions and binary 
maxima. For example, the pointwise maximum term (part of the copyEven exam- 
ple): max;|o<j<1en(a) ((i%2=0 ? (qa=a A qi=j ? rd: 0) : (qa=a ^ qi=j ? 1 : 0))) will 
be reduced to: 


max( MaX j|0<j<len(a)Aj%2=0 ((da=a A qi=j ? rd : 0)), 
MAX j|0<j<1en(a)Aj%240 ((qa=a A qi=j ? 1: 0))) 
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((b? p: )) ee = (T,6), where (S,6) = maia T = {(e, true) | e € S} 
(pı + D2) nitrates = (Tı U Tə, lem(ô1, 62)) 
where (T1,61) = (pi ea (Tz, 62) = (2 }maimaz(a) 
(max(P1,P2))smatimar(x) = (min(P1;P2))smattmaxz) = (P1 + P2)smatimax(2) 28 above 
(pi — (b? p: O)Jsmalimas(2) = (Tı U Ta, lem(d1, 62)) 
where oe = (p matimati (S2,82) = (bhmaila) 
= {(e,pı > 0) | e € So} 


((p, b)) mattman(e) = (Tp U Ty, 0 ) where (Tp, dp) = (p Jraiimantai? (So, 60) = ()smaatl(a)? 
5’ = lem(dp, 64), (b', dn) = (b Lol) (p', bp) = (p)_ oo(a)? 
Ty = {(eo,( \/ (8 Ap’ > 0)[d/2})) Vv \ (BA bp)[(ep + dp)/x]) | ev € So} 


d€[0,5/—1] (ep,bp) ET p 
dp €[0,5p—1] 


Fig. 5. Filtered boundary expression computation. 


Arbitrarily-Small Values. We exploit a high-level case-split in our algorithm 
design analogous to Cooper’s: given a pointwise maximum expression max, (p), 
either a smallest value of x exists such that p has its maximal value (and b is 
true), or there are arbitrarily small values of x defining this maximal value. To 
handle the latter case, we define a completely analogous (Pola) function, which 
recursively replaces all boolean expressions b' in p with (b'}_ o(s) as computed by 
Cooper; we relegate the definition to Sect. B.3 of the TR [15]. We then use (b' ? 
p':0), where (b', 61) = (0) soar) and (p’, 62) = (PL o(a) as our expression in this 
case. Note that this expression still depends on x if it contains (non-)divisibility 
constraints; Theorem 4 shows how x can be eliminated using 6; and 6. 


Selecting Boundary Expressions for Maximum Elimination. Next, we 
consider the case of selecting an appropriate set of boundary expressions, given a 
max,|p (p) term. We define this first for p in isolation, and then give an extended 
definition accounting for the b. Just as for Cooper’s algorithm, the boundary 
expressions must be a set guaranteed to include the smallest value of x defining 
the maximum value in question. The set must be finite, and be as small as 
possible for efficiency of our overall algorithm. We refine the notion of boundary 
expression, and compute a set of pairs (e,b’) of integer expression e and its 
filter condition b': the filter condition represents an additional condition under 
which e must be included as a boundary expression. In particular, in contexts 
where 0’ is false, e can be ignored; this gives us a way to symbolically define 
an ultimately-smaller set of boundary expressions, particularly in the absence of 
contextual information which might later show b’ to be false. We call these pairs 
filtered boundary expressions. 


Definition 1 (Filtered Boundary Expressions). The filtered boundary 
expression computation for x in p, written (PÌsmalimaz(a)> returns a pair of a set 


T of pairs (e,b'), and an integer constant 6, as defined in Fig. 5. This definition 
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is also overloaded with a definition of filtered boundary expression computation 
for (x | b) in p, written ((p, b))), 


smallmax(x)* 


Just as for Cooper’s (b}maix) Computation, our function (Phsmalimaz(z) COM 
putes the set T of (e,b’) pairs along with a single integer constant ô, which is 
the least common multiple of the divisors occurring in p; the desired smallest 
value of x may actually be some e + d where d € [0,6 — 1]. There are three key 
points to Definition 1 which ultimately make our algorithm efficient: 
First, the case for ((6? p:0))nattmax(2) Only includes boundary expressions 
for making b true. The case of b being false (from the structure of the permission 
expression) is not relevant for trying to maximise the permission expression’s 
value (note that this case will never apply under a subtraction operator, due 
to our simplified grammar, and the case for subtraction not recursing into the 
right-hand operand). 

Second, the case for (p; — (b? p:0))) 


an ree dually only considers bound- 


ary expressions for making b false (along with the boundary expressions for max- 
imising pı). The filter condition pı > 0 is used to drop the boundary expressions 
for making b false; in case pı is not strictly positive we know that the evaluation 
of the whole permission expression will not yield a strictly-positive value, and 
hence is not an interesting boundary value for a non-negative maximum. 

Third, in the overloaded definition of ((P, b))matimac(«)» We Combine boundary 
expressions for p with those for b. The boundary expressions for b are, however, 
superfluous if, in analysing p we have already determined a value for x which 
maximises p and happens to satisfy b. If all boundary expressions for p (whose 
filter conditions are true) make b true, and all non-trivial (i.e. strictly positive) 
evaluations of (P)_ cote) used for potentially defining p’s maximum value also 
satisfy b, then we can safely discard the boundary expressions for b. 

We are now ready to reduce pointwise maximum expressions to equivalent 
maximum expressions over finitely-many cases: 


Theorem 4 (Simple Maximum Expression Elimination). For any pair 
(p,b), if = p > 0, then we have: 


= max p = max( max (b” A ble + d/z]? ple + d/z] : 0)), 
a|b (e,b” ET 
deé[0,5—1] 


b'\d/x] ? v'\d/a] : 
Sean oe [d/z] ? p'[d/x] : 0)) 


where (T, ô) = ((p, 2) ae ee ne (0, 61) = (ooz) and (p', ô2) = (PL cola): 


To see how our filter conditions help to keep the set T (and therefore, the 
first iterated maximum on the right of the equality in the above theorem) small, 
consider the example: maxz),>9 ((t=i ?1:0)) (so p is (x=i? 1:0), while b is 
xz > 0). In this case, evaluating ((D,>))matimax(c) Yields the set T = 
{(i, true), (0,2 < 0)} with the meaning that the boundary expression 7 is con- 
sidered in all cases, while the boundary expression 0 is only of interest if i < 0. 
The first iterated maximum term would be max((trueAi>0? (i=i?1:0):0), (i<0 
A0>0? (0=i?1:0):0)). We observe that the term corresponding to the boundary 
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Table 1. Experimental results. For each program, we list the lines of code and the num- 
ber of loops (in brackets the nesting depth). We report the relative size of the inferred 
specifications compared to hand-written specifications, and whether the inferred spec- 
ifications are precise (a star next to the tick indicates slightly more precise than hand- 
written specifications). Inference times are given in ms. 


Program LOC Loops Size Prec. Time Program LOC Loops Size Prec. Time 


addLast 12 1(1) 19 JY 21 initPartBug 19 2(1) 15 VY 31 
append 13 1(1) 19 Y 32  insertSort 21 2(2) 25 </* 35 
arrayl 17 2(2) 09 X 28 javaBubble 24 2(2) 2.3 V* 32 
array2 23 3(2) 0.9 X 35 knapsack 21 2(2) 13 X 45 
array3 23 2(2) 11 VY 24 lis 37 4(2) 42 VY B 
arrayRev 18 1(1) 3.2 V* 28 matrixmult 33 3(3) 15 Y 78 
bubbleSort 23 2(2) 18 /* 34 mergeinter 23 2(1) 34 X 56 
copy 16 2(1) 16 JY 27 mergeintbug 23 2(1) 26 X 59 
copyEven 17 1(1) 16 y% 27 — memcopy 16 2(1) 16 Vv 28 
copyEven2 14 1(1) 14 x 20  multarray 26 2(2) 21 v 40 
copyEven3 14 1(1) 2.2 Y* 23 © parcopy 20 2(1) 12 v 30 
copyOdd 21 2(1) 24 SY 55 © pararray 20 2(1) 12 <J 31 
copyOddBug 19 2(1) 7.1 xX 57  parCopyEven 22 2(1) 50 V¥* 79 
copyPart 17 2(1) 1.7 SY 30 © parMatrix 35 4(2) 11 Y 80 
countDown 21 3(2) 11 Y 32  parNested 31 4(2) 05 X 57 
diff 31 2(2) 20 X 70 relax 33 1(1) 14 V* 55 
find 19 1(1) 30 VY 43 reverse 21 2(1) 39 VY 42 
findNonNull 19 1(1) 30 xy 40 reverseBug 21 2(1) 1.7 V/V 42 
init 18 2(1) 11 Y 28 sanfoundry 27 2(1) 21 Y 37 
init2d 23 2(2) 21 Y 52 selectSort 26 2(2) 10 X 38 
initEven 18 2(1) 09 X 26 strCopy 16 2(1) 09 X 2 
initEvenbug 18 2(1) 15 X 28 strLen 10 1(1) 08 X 15 
initNonCnst 18 2(1) 11 Y% 27 swap 15 1(1) 15 VY 19 
initPart 19 2(1) 11 Y 30  swapBug 15 1(1) 15 Y 19 


value 0 can be simplified to 0 since it contains the two contradictory conditions 
i < 0 and 0 = i. Thus, the entire maximum can be simplified to (i>0? 1:0). 
Without the filter conditions the result would instead be max((i>0 ? 1 : 0), 
(O=i ? 1: 0)). In the context of our permission analysis, the filter conditions 
allow us to avoid generating boundary expressions corresponding e.g. to the 
integer loop invariants, provided that the expressions generated by analysing 
the permission expression in question already suffice. We employ aggressive syn- 
tactic simplification of the resulting expressions, in order to exploit these filter 
conditions to produce succinct final answers. 


6 Implementation and Experimental Evaluation 


We have developed a prototype implementation of our permission inference. The 
tool is written in Scala and accepts programs written in the Viper language [38], 
which provides all the features needed for our purposes. 
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Given a Viper program, the tool first performs a forward numerical anal- 
ysis to infer the over-approximate loop invariants needed for our handling of 
loops. The implementation is parametric in the numerical abstract domain used 
for the analysis; we currently support the abstract domains provided by the 
APRON library [24]. As we have yet to integrate the implementation of under- 
approximate invariants (e.g., [35]), we rely on user-provided invariants, or assume 
them to be false if none are provided. In a second step, our tool performs the 
inference and maximum elimination. Finally, it annotates the input program 
with the inferred specification. 

We evaluated our implementation on 43 programs taken from various sources; 
included are all programs that do not contain strings from the array memory 
safety category of SV-COMP 2017, all programs from Dillig et al. [14] (except 
three examples involving arrays of arrays), loop parallelisation examples from 
VerCors [5], and a few programs that we crafted ourselves. We manually checked 
that our soundness condition holds for all considered programs. The parallel loop 
examples were encoded as two consecutive loops where the first one models the 
forking of one thread per loop iteration (by iteratively exhaling the permissions 
required for all loop iterations), and the second one models the joining of all 
these threads (by inhaling the permissions that are left after each loop iteration). 
For the numerical analysis we used the polyhedra abstract domain provided by 
APRON. The experiments were performed on a dual core machine with a 2.60 GHz 
Intel Core i7-6600U CPU, running Ubuntu 16.04. 

An overview of the results is given in Table 1. For each program, we compared 
the size and precision of the inferred specification with respect to hand-written 
ones. The running times were measured by first running the analysis 50 times 
to warm up the JVM and then computing the average time needed over the 
next 100 runs. The results show that the inference is very efficient. The inferred 
specifications are concise for the vast majority of the examples. In 35 out of 48 
cases, our inference inferred precise specifications. Most of the imprecisions are 
due to the inferred numerical loop invariants. In all cases, manually strengthen- 
ing the invariants yields a precise specification. In one example, the source of 
imprecision is our abstraction of array-dependent conditions (see Sect. 4). 


7 Related Work 


Much work is dedicated to the analysis of array programs, but most of it focuses 
on array content, whereas we infer permission specifications. The simplest app- 
roach consists of “smashing” all array elements into a single memory location [4]. 
This is generally quite imprecise, as only weak updates can be performed on the 
smashed array. A simple alternative is to consider array elements as distinct vari- 
ables [4], which is feasible only when the length of the array is statically-known. 
More-advanced approaches perform syntax-based [18, 22,25] or semantics-based 
[12,34] partitions of an array into symbolic segments. These require segments 
to be contiguous (with the exception of [34]), and do not easily generalise to 
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multidimensional arrays, unlike our approach. Gulwani et al. [20] propose an 
approach for inferring quantified invariants for arrays by lifting quantifier-free 
abstract domains. Their technique requires templates for the invariants. 

Dillig et al. [14] avoid an explicit array partitioning by maintaining con- 
straints that over- and under-approximate the array elements being updated by 
a program statement. Their work employs a technique for directly generalising 
the analysis of a single loop iteration (based on quantifier elimination), which 
works well when different loop iterations write to disjoint array locations. Gedell 
and Hähnle [17] provide an analysis which uses a similar criterion to determine 
that it is safe to parallelise a loop, and treat its heap updates as one bulk effect. 
The condition for our projection over loop iterations is weaker, since it allows 
the same array location to be updated in multiple loop iterations (like for exam- 
ple in sorting algorithms). Blom et al. [5] provide a specification technique for 
a variety of parallel loop constructs; our work can infer the specifications which 
their technique requires to be provided. 

Another alternative for generalising the effect of a loop iteration is to use a 
first order theorem prover as proposed by Kovacs and Voronkov [28]. In their 
work, however, they did not consider nested loops or multidimensional arrays. 
Other works rely on loop acceleration techniques [1,7]. In particular, like ours, 
the work of Bozga et al. [7] does not synthesise loop invariants; they directly 
infer post-conditions of loops with respect to given preconditions, while we addi- 
tionally infer the preconditions. The acceleration technique proposed in [1] is 
used for the verification of array programs in the tool BOOSTER [2]. 

Monniaux and Gonnord [36] describe an approach for the verification of array 
programs via a transformation to array-free Horn clauses. Chakraborty et al. [10] 
use heuristics to determine the array accesses performed by a loop iteration and 
split the verification of an array invariant accordingly. Their non-interference 
condition between loop iterations is similar to, but stronger than our soundness 
condition (cf. Sect.4). Neither work is concerned with specification inference. 

A wide range of static/shape analyses employ tailored separation logics as 
abstract domain (e.g., [3,9,19,29,41]); these works handle recursively-defined 
data structures such as linked lists and trees, but not random-access data struc- 
tures such as arrays and matrices. Of these, Gulavani et al. [19] is perhaps 
closest to our work: they employ an integer-indexed domain for describing recur- 
sive data structures. It would be interesting to combine our work with such 
separation logic shape analyses. The problems of automating biabduction and 
entailment checking for array-based separation logics have been recently studied 
by Brotherston et al. [8] and Kimura and Tatsuta [27], but have not yet been 
extended to handle loop-based or recursive programs. 


8 Conclusion and Future Work 


We presented a precise and efficient permission inference for array programs. 
Although our inferred specifications contain redundancies in some cases, they are 


Permission Inference for Array Programs 71 


human readable. Our approach integrates well with permission-based inference 
for other data structures and with permission-based program verification. 


As future work, we plan to use SMT solving to further simplify our inferred 


specifications, to support arrays of arrays, and to extend our work to an inter- 
procedural analysis and explore its combination with biabduction techniques. 
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Abstract. We study from a computability perspective static program 
analysis, namely detecting sound program assertions, and verification, 
namely sound checking of program assertions. We first design a general 
computability model for domains of program assertions and correspond- 
ing program analysers and verifiers. Next, we formalize and prove an 
instantiation of Rice’s theorem for static program analysis and verifica- 
tion. Then, within this general model, we provide and show a precise 
statement of the popular belief that program analysis is a harder prob- 
lem than program verification: we prove that for finite domains of pro- 
gram assertions, program analysis and verification are equivalent prob- 
lems, while for infinite domains, program analysis is strictly harder than 
verification. 


1 Introduction 


It is common to assume that program analysis is harder than program verifi- 
cation (e.g. [1,17,22]). The intuition is that this happens because in program 
analysis we need to synthesize a correct program invariant while in program ver- 
ification we have just to check whether a given program invariant is correct. The 
distinction between checking a proof and computing a witness for that proof can 
be traced back to Leibniz [18] in his ars iudicandi and ars inveniendi, respec- 
tively representing the analytic and synthetic method. In Leibniz’s ars combina- 
toria, the ars inveniendi is defined as the art of discovering “correct” questions 
while ars iudicandi is defined as the art of discovering “correct” answers. These 
foundational aspects of mathematical reasoning have a peculiar meaning when 
dealing with questions and answers concerning the behaviour of computer pro- 
grams as objects of our investigation. 

Our main goal is to define a general and precise model for reasoning on the 
computability aspects of the notions of (sound or complete) static analyser and 
verifier for generic programs (viz. Turing machines). Both static analysers and 
verifiers assume a given domain A of abstract program assertions, that may range 
from synctatic program properties (e.g., program sizes or LOCs) to complexity 
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properties (e.g., number of execution steps in some abstract machine) and all the 
semantic properties of the program behaviour (e.g., value range of program vari- 
ables or shape of program memories). A program analyser is defined to be any 
total computable (i.e., total recursive) function that for any program P returns 
an assertion ap in A, which is sound when the concrete meaning of the assertion 
ap includes P. Instead, a program verifier is a (total) decision procedure which 
is capable of checking whether a given program P satisfies a given assertion a 
ranging in A, answering “true” or “don’t know”, which is sound when a positive 
check of a for P means that the concrete meaning of the assertion a includes 
P. Completeness, which coupled with soundness is here called precision, for a 
program analyser holds when, for any program P, it returns the strongest asser- 
tion in A for P, while a program verifier is called precise if it is able to prove 
any true assertion in A for a program P. This general and minimal model allows 
us to extend to static program analysis and verification some standard results 
and methods of computability theory. We provide an instance of the well-known 
Rice’s Theorem [29] for generic analysers and verifiers, by proving that sound 
and precise analysers (resp. verifiers) exist only for trivial domains of assertions. 
This allows us to generalise known results about undecidability of program anal- 
ysis, such as the undecidability of the meet over all paths (MOP) solution for 
monotone dataflow analysis frameworks [15], making them independent from the 
structure of the domain of assertions. Then, we define a model for comparing the 
relative “verification power” of program analysers and verifiers. In this model, 
a verifier V on a domain A of assertions is more precise than an analyser A on 
the same domain A when any assertion a in A which can be proved by A for a 
program P—this means that the output of the analyser A(P) is stronger than 
the assertion a—can be also proved by V. Conversely, A is more precise than 
V when any assertion a proved by V can be also proved by A. We prove that 
while it is always possible to constructively transform a program analyser into 
an equivalent verifier (i.e., with the same verification power), the converse does 
not hold in general. In fact, we first show that for finite domains of assertions, 
any “reasonable” verifier can be constructively transformed into an equivalent 
analyser, where reasonable means that the verifier V is: (i) nontrivial: for any 
program, V is capable to prove some assertion, possibly a trivially true asser- 
tion; (ii) monotone: if V proves an assertion a and a is stronger than a’ then 
V is also capable of proving a’; (iii) logically meet-closed: if V proves both a, 
and az and the logical conjunction a, ^ a2 is a representable assertion then V 
is also capable of proving it. Next, we prove the following impossibility result: 
for any infinite abstract domain of assertions A, no constructive reduction from 
reasonable verifiers on A to equivalent analysers on A is possible. This provides, 
to the best of our knowledge, the first formalization of the common folklore that 
program analysis is harder than program verification. 


2 Background 


We follow the standard terminology and notation for sets and computable func- 
tions in recursion theory (e.g., [12,26,30]). If X and Y are sets then X —> Y 
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and X + Y denote, respectively, the set of all total and partial functions from 
X to Y.If f:X + Y then f(x)| and f(x)? mean that f is defined/undefined 
on x € X. Hence dom(f) = {x € X | f(a)| }. If S C Y then f(x) € S denotes 
the implification f(z)| = f(x) € S. If f,g: X + Y then f = g means that 
dom(f) = dom(g) and for any z € dom(f) = dom(g), f(x) = g(x). The set of 
all partial (total) recursive functions on natural numbers is denoted by N + N 
(N > N). Recall that A C N is a recursively enumerable (r.e., or semidecidable) 
set if A = dom(f) for some f € N + N, while A C N is a recursive (or decidable) 
set if both A and its complement A = N \ A are recursively enumerable, and 
this happens when there exists f € N > N such that f =An.n€A?71:0. 

Let Prog denote some deterministic programming language which is Turing 
complete. More precisely, this means that for any partial recursive function f : 
N + N there exists a program P € Prog such that [P] S f, where [P] : D + D 
is a denotational input/output semantics of P on a domain D of input/output 
values for Prog, where: undefinedness encodes nontermination and ~ means 
equality up to some recursive encoding enc : D > N and decoding dec : N > D 
functions, i.e., f = enc o| P]odec. We also assume a small-step transition relation 
= C (Prog x D) x ((Prog xD) U D) for Prog defining an operational semantics 
which is functionally equivalent to the denotational semantics: (P, i) =>* o iff 
[P]i = o. By an abuse of notation, we will identify the input/output semantics 
of a program P with the partial recursive function computed by P, i.e., we will 
consider programs P € Prog whose input /output semantics is a partial recursive 
function |P] : N > N, so that, by Turing completeness, {[P] :N +N] Pe 
Prog} = N + N. 


3 Abstract Domains 


Static program analysis and verification are always defined with respect to a 
given (denumerable) domain of program assertions, that we call here abstract 
domain [7], where the meaning of assertions is formalized by a function which 
induces a logical implication relation between assertions. 


Definition 3.1 (Abstract Domain). An abstract domain is a tuple (A, y, <y} 
such that: 


(1) A is any denumerable set; 
(2) y: A— p(Prog) is any function; 
(3) <, = {(a1,a2) € A x A | y(a1) C 7(a2)} is a decidable relation. 


An abstract element a € A such that y(a) = Prog is called an abstract top, while 
a is called an abstract bottom when y(a) = Ø. 


The elements of A are called assertions or abstract values, y is called con- 
cretization function (this may also be a nonrecursive function, which is typical 
of abstract domains representing semantic program properties), and <, is called 
the implication or approximation relation of A. Thus, in this general model, 
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a program assertion a € A plays the role of some abstract representation of any 
program property y(a) € (Prog), while the comparison relation a, <y az holds 
when a, is a stronger (or more precise) property than az. Let us also observe 
that, as a limit case, Definition 3.1 allows an abstract domain to be empty, that 
is, the tuple (Ø, Ø, Ø) satisfies the definition of abstract domain, where Ø denotes 
both the empty set, the empty function (i.e., the unique subset of @ x @) and 
the empty relation. 


Example 3.2. Let us give some simple examples of abstract domains. 


(1) Consider A = N with y(n) = {P € Prog | size(P) < n}, where 
size : Prog — N is some computable program size function. Here, <, is 
clearly decidable and coincides with the partial order <j on numbers. 

(2) Consider A = N with y(n) £ {P € Prog | Vi.do,k.((P, i) >* 0) &k < n}, 
i.e., n represents all the programs which, given any input, terminate in at 
most n steps. Here again, n <y m iff n <j m, so that <, is decidable. 

(3) Consider A = N with y(n) = {P € Prog | Vi € [0,n].do. (P, i) =* o}, that 
is, n represents all the programs which terminate for any input i < n. Once 
again, n <, m iff n <y m. 

(4) Consider A = N with y(n) = {P € Prog | vi € N.[P](i) = 0 => 0 < n}, 
that is, n represents those programs which, in case of termination, give an 
output o bounded by n. Again, n <y m iff n <y m. 

(5) Consider A = N 4 N with (9) ê {P € Prog | Vi. (g(i)| = (do, k.(P, i) =" 
o, k < g(i))) A (E0, k.(P, i) =" 0) = g(i)|, k < g(i))}, that is, g represents 
those programs whose time complexity is bounded by the function g. Here, 
g Sy G iff Vi. gL > GL A gli) < g'(i)). 


Definition 3.1 does not require injectivity of the concretization function y, 
thus multiple assertions could have the same meaning. Two abstract values 
aı,a2 € A are called equivalent when 7(a1) = y(az2). Let us observe that since 
<, is required to be decidable, the equivalence (a1) = y(a2) is decidable as well. 
For example, for the well-known numerical abstract domain of convex polyhe- 
dra [11] represented through linear constraints between program variables, we 
may well have multiple representations P, and P> for the same polyhedron, 
eg., P = {x = z,z < y} and Py = {x = z,x < y} both represent the 
same polyhedron. Thus, in general, an abstract domain A is not required to 
be partially ordered by <}. On the other hand, the relation <, is clearly a 
preorder on A. The only basic requirement is that for any pair of abstract val- 
ues a 1,a2 € A, one can decide if a; is a more precise program assertion than 
ag, i.e., if y(a1) C y(a2) holds. In this sense we do not require that a partial 
order < is defined a priori on A and that y is monotone w.r.t. <, since for our 
purposes it is enough to consider the preorder <, induced by y. If instead A is 
endowed with a partial order < 4 and A is defined in abstract interpretation [7,8] 
through a Galois insertion based on the concretization map y, then it turns out 
that y(a1) C y(a2) € a1 <a az holds, so that the decidability of the relation 
<, = {(a1,a2) E€ Ax A | y(a1) C y(a2)} boils down to the decidability of the par- 
tial order relation <4. As an example, it is well known that the abstract domain 
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of polyhedra does not admit a Galois insertion [11], nevertheless its induced pre- 
order relation <, is decidable: for example, for polyhedra represented by linear 
constraints, there exist algorithms for deciding if y(P,) C y(P2) for any pair of 
convex polyhedra representations P} and P> (see e.g. [23, Sect. 5.3]). 


3.1 Abstract Domains in Abstract Interpretation 


An abstract domain in standard abstract interpretation [7-9] is usually defined 
by a poset (A, < 4) containing a top element T € A and aconcretization map ya : 
A — g(Dom), where Dom denotes some concrete semantic domain (e.g., program 
stores or program traces), such that: (a) A is machine representable, namely the 
abstract elements of A are encoded by some data structures (e.g., tuples, vectors, 
lists, matrices, etc.), and some algorithms are available for deciding if a1 <A a2 
holds; (b) a1 <4 a2 & ya(ai) © ya(a2) holds (this equivalence always holds 
for Galois insertions); (c) y4(T) = Dom. Let us point out that Definition 3.1 is 
very general since the concretization of an abstract value can be any program 
property, possibly a purely syntactic property or some space or time complexity 
property, as in the simple cases of Example 3.2 (1)-(2)-(5). 

Let ya : A — e(Dom) and assume that Dom is defined by program stores, 
namely Dom £ Var — Val, where Var is a finite set of program variables and Val 
is a corresponding denumerable set of values. Since Var — Val has a finite domain 
and a denumerable range, we can assume a recursive encoding of finite tuples of 
values into natural numbers N, i.e. Var — Val = N, and define y4 : A > p(N). 
This is equivalent assuming that programs have one single variable, say x, which 
may assume tuples of values in Val. A set of numbers y,4(a) € (N) is meant to 
represent a property of the values stored in the program variable x at the end 
of the program execution, that is, if the program terminates its execution then 
the variable x stores a value in y,4(a). Hence, as usual, the property Ø € e(N) 
means that the program does not correctly terminate its execution either by true 
nontermination or by some run-time error, namely, that the exit program point 
is not reachable. For simplicity, we do not consider intermediate program points 
and assertions in our semantics. 

For an abstract domain (A, y4, <a) in standard abstract interpretation, the 
corresponding concretization function y : A — (Prog) of Definition3.1 is 
defined as: 

(a) 2 {P € Prog | Vi € N. [P](i) € ya(a)} 


where we recall that [P](i) € ya(a) means [P] (i) = o = o € ya(a). Hence, if 
A contains top T4 and bottom 1, such that ya(Ta) = N and ya(La) = Ø 
then 7(T 4) = Prog and (La) = {P € Prog | P never terminates}. Moreover, 
since y4 is monotonic, we have that y is monotonic as well. The fact that all 
the elements in A are machine representable boils down to the requirement that 
A is a recursive set, while the binary preorder relation <4 is decidable because 
a1 <A a2 & (a1) C y(az2) holds and <4 is decidable. This therefore defines an 
abstract domain according to Definition 3.1. 
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In this simple view of the abstract domain A, there is no input property for 
the variable x, meaning that at the beginning x may store any value. It is easy 
to generalize the above definition by requiring an input abstract property in A 
for x, so that the abstract domain is a Cartesian product A x A together with 
a concretization y/° : A x A — o(Prog) defined as follows: 


7? ((ai,@0)) © {P € Prog | Vi € N.i € ya(ai) > [P](@) € ya(ao)}- 
This is a generalization since, for any a € A, we have that 7(a) = y/°((Ta,a)). 


Example 3.3 (Interval Abstract Domain). Let Int be the standard interval 
domain [7] restricted to natural numbers in N, endowed with the standard subset 
ordering: 


Int £ {[a,b] | a,b E€ N, a < b} U {Lint} U {[a, +00) | a € N} 


with concretization yy : Int > p(N), where Yint(Lint) = Z, Yine((a, b]) = [a,b] 
and Ymt([0, +00)) = N, so that [0,+00) is also denoted by Tit. Thus, here, 
for the concretization function y : Int — (Prog) we have that: yY(Tint) = 
Prog, y(Limt) = {P € Prog | vi. [P]()t }, y([a,+00)) = {P € Prog | Vi € 
N. [Pl => [P] (i) > a}. We also have the input/output concretization 7” : 
Int x Int — p(Prog), where 


y/°((I, JY) £ {P € Prog | Vi € N.i € Yim (I) > [PI € mm (J)}.- 


4 Program Analysers and Verifiers 
In our model, the notions of program analyser and verifier are as general as 
possible. 


Definition 4.1 (Program Analyser). Given an abstract domain (A, y, <y), 
a program analyser on A is any total recursive function A: Prog — A. 

The set of analysers on a given abstract domain A will be denoted by A4. 

An analyser A € A4 is sound if for any P € Prog anda € A, 


A(P) <y a => P € (a) 


while A is precise if it is also complete, i.e., if the reverse implication also holds: 


Per (a) > A(P) <4 a. 


Notice that this definition of soundness is equivalent to the standard notion 
of sound static analysis, namely, for any program P, A(P) always outputs a 
program assertion which is satisfied by P, i.e., P € y(A(P)). Let us also note 
that on the empty abstract domain Ø, no analyser can be defined simply because 
there exists no function in Prog — Ø. Instead, for a singleton abstract domain 
A. = {e}, if A € Aa, is sound then y(e) = Prog, so that è is necessarily 
an abstract top. Also, if the abstract domain A contains a top abstract value 
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Ta € Athen, as expected, AP.T 4 is a trivially sound analyser on A. Finally, we 
observe that if A; and Az are both precise on the same abstract domain then 
we have A; =, A2, meaning that A; and Az coincide up to equivalent abstract 
values, i.e., yo Aı = yoAə. In fact, for any P € Prog, we have that P € y(A2(P)) 
implies (41 (P)) © y(42(P)) and P € y(Ai(P)) implies »(Aa(P)) © (A1 (P)), 
so that Aı =y Ao. 


Example 4.2. Software metrics static analysers [35] deal with nonsemantic pro- 
gram properties, such as the domain in Example 3.2 (1). Bounded model check- 
ing [4,34] handles program properties such as those encoded by the domains 
of Example 3.2 (2)-(3). Complexity bound analysers such as [32,36] cope with 
domains of properties such as those in Example 3.2 (4)-(5). Numerical abstract 
domains used in program analysis (see [23]) include the interval abstraction 
described in Example 3.3. 


Definition 4.3 (Program Verifier). Given an abstract domain (A, y, <,), a 
program verifier on A is any total recursive function V : Prog x A — {t,?}. 
The set of verifiers on a given abstract domain A will be denoted by V4. 

A verifier V € V4 is sound if for any P € Prog and a € A, 


V(P,a)=t => Pera) 
while V is precise if it is also complete, i.e., if the reverse implication also holds: 
P € qla) > V(P,a) =t. 


A verifier V € Vy is nontrivial if for any program there exists at least one 
assertion which V is able to prove, i.e., for any P € Prog there exists some a € A 
such that V(P,a) = t. Also, a verifier is defined to be trivial when it is not 
nontrivial. 

A verifier V € V4 is monotone when the verification algorithm is monotone w.r.t. 
<y, i.e., (V(P,a)=t A a<, d) > V(P,a') =t. 


Remark 4.4. Let us observe some straight consequences of Definition 4.3. 
(1) Notice that for all nonempty abstract domains A, A(P,a).? is a legal and 
vacuously sound verifier. Also, if A = Ø is the empty abstract domain then the 
empty verifier V : Prog xØ — {t,?} (namely, the function with empty graph) is 
trivially precise. 

(2) Let us observe that if V is nontrivial and monotone then V is able to prove 
any abstract top: in fact, if T € A and y(T) = Prog then, for any P € Prog, 
since there exists some a € A such that V(P,a) = t and a <, T, then, by 
monotonicity, V(P, T) = t. 

(3) Note that if a verifier V is precise then V(P,a) = ? = P g q(a), so that 
in this case an output V(P,a) = ? always means that P does not satisfy the 
property a. 

(4) Finally, if V; and V2 are precise on the same abstract domain then V\(P, a) = 
t= P € a) & V2(P,a) = t, so that Vy = V2. 
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Example 4.5. Program verifiers abund in literature, e.g., [3,21,27]. For exam- 
ple, [13] aims at complexity verification on domains like that in Example 3.2 (5) 
while reachability verifiers like [33] can check numerical properties of program 
variables such as those of Example 3.3. 


5 Rice’s Theorem for Static Program Analysis and 
Verification 


Classical Rice’s Theorem in computability theory [26,29,30] states that an exten- 
sional property IT C N of an effective numbering {yn | n E€ N} = N + N of 
partial recursive functions is a recursive set if and only if H = Ø or H =N, 
i.e., IT is trivial. Let us recall that IJ C N is extensional when Yn = Ym implies 
nell & mel. When dealing with program properties rather than indices of 
partial recursive functions, i.e., when IT C Prog, Rice’s Theorem states that any 
nontrivial semantic program property is undecidable (see [28] for a statement 
of Rice’s Theorem tailored for program properties). It is worth recalling that 
Rice’s Theorem has been extended by Asperti [2] through an interesting gen- 
eralization to so-called “complexity cliques”, namely nonextensional program 
properties which may take into account the space or time complexity of pro- 
grams: for example, the abstract domain of Example 3.2 (5) is not extensional 
but when logically “intersected” with an extensional domain (i.e., it is a prod- 
uct domain A, x Ay where the concretization function is the set intersection 
A(a1, 42).71(a1) N y2(a2)) falls into this generalized version of Rice’s Theorem. 

In the following, we provide an instantiation of Rice’s Theorem to sound 
static program analysis and verification by introducing a notion of extension- 
ality for abstract domains. Abstract domains commonly used in abstract inter- 
pretation turn out to be extensional, when they are used for approximating the 
input /output behaviour of programs. For example, if a sound abstract interpreta- 
tion of a program P in the interval abstract domain computes as abstract output 
a program assertion such as x € [1,5] and y € [2, +00) then this assertion is a 
sound abstract output for any other program Q having the same input/output 
behaviour of P. 


Definition 5.1 (Extensional Abstract Domain). An abstract domain 
(A, 7, <y) is extensional when for any a € A, y(a) C Prog is an extensional 
program property, namely, if |P] = [Q] then P € 7(a) © Q € ya). 


As usual, the intuition is that an extensional program property depends 
exclusively on the input/output program semantics [|]. As a simple example, 
the domains of Example 3.2 (3)-(4) are extensional while the domains of Exam- 
ple 3.2 (1)-(2)-(5) are not. 


Definition 5.2 (Trivial Abstract Domain). An abstract domain (A, y, <y) 
is trivial when A contains abstract bottom or top elements only, i.e., for any 
a E€ A, yla) € {Ø, Prog}. 


Program Analysis Is Harder Than Verification: A Computability Perspective 83 


Definition 5.2 allows 4 possible types for a trivial abstract domain A: (1) 
A = Ø; (2) Ais nonempty and consists of bottom elements only, i.e., A # Ø and 
for all a € A, 7(a) = Ø; (3) A is nonempty and consists of top elements only, 
i.e., A # Ø and for all a € A, 7(a) = Prog; (4) A satisfies (2) and (3), i.e., A 
contains both bottom and top elements. 


Theorem 5.3 (Rice’s Theorem for Program Analysis). Let (A, y, <4) be 
an extensional abstract domain and let A € A4 be a sound analyser. Then, A is 
precise iff A is trivial. 


Proof. Since we assume the existence of a sound analyser A € A, on the exten- 
sional abstract domain A, observe that necessarily A 4 Ø. 

Assume that A is trivial. We have to show that for any a € A and P € Prog, 
A(P) <, a & P € q(a). Assume that P € 7(a) for some a € A. Then, we have 
that y(a) 4 Ø, so that, since A is trivial, it must necessarily be that y(a) = Prog. 
By soundness of A, P € 7(A(P)), so that, since A is trivial, y(A(P)) = Prog. 
Hence, we have that y(A(P)) = 7(a), thus implying A(P) <, a. On the other 
hand, if A(P) <, a then y(A(P)) C y(a), so that, since, by soundness of A, 
P €7(A(P)), we also have that P € 7(a). 

Conversely, assume now that A is precise, namely, P € y(a) iff A(P) <, a. 
Thus, since A is a total recursive function and <, is decidable, we have that, for 
any a € A, P €? y(a) is decidable. Since (a) is an extensional program property, 
by Rice’s Theorem, (a) must necessarily be trivial, i.e., y(a) € {Ø, Prog}. This 
means that the abstract domain A is trivial. 


Rice’s Theorem for program analysis can be applied to several abstract 
domains. Due to lack of space, we just mention that the well-known undecid- 
ability of computing the meet over all paths (MOP) solution for a monotone 
dataflow analysis problem, proved by Kam and Ullman [15, Sect. 6] by resorting 
to undecidability of Post’s Correspondence Problem, can be derived as a simple 
consequence of Theorem 5.3. 

Along the same lines of Theorem 5.3, Rice’s Theorem can be instantiated to 
program verification as follows. 


Theorem 5.4 (Rice’s Theorem for Program Verification). Let (A, y, <4) 
be an extensional abstract domain and let V € V4 be a sound, nontrivial and 
monotone verifier. Then, V is precise iff A is trivial. 


Proof. Let A be an extensional abstract domain and V € V, be sound and non- 
trivial. If A = Ø then A is trivial while the only possible verifier V : Prog x@ — 
{t,?} is the empty verifier, which is vacuously precise but it is not nontrivial. 
Thus, A 4 Ø holds. 

Assume that V is precise, that is, P € (a) iff V(P, a) = t. Hence, since V is a 
total recursive function, V(P,a) =" t is decidable, so that P €? y(a) is decidable 
as well. As in the proof of Theorem 5.3, since y(a) is an extensional program 
property, by Rice’s Theorem, 7(a) € {Ø, Prog}. Thus, the abstract domain A is 
trivial. 
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Conversely, let A 4 Ø be a trivial abstract domain. We have to prove that 
for any a € A and P € Prog, V(P,a) = t & P € (a). Consider any a € A. 
Since A is trivial, y(a) E€ {@, Prog}. If y(a) = Ø then, by soundness of V, for 
any P € Prog, V(P,a) = ?, so that V(P,a) = t = P € (a) holds. If, instead, 
(a) = Prog, i.e. a is an abstract top, then, since V is assumed to be nontrivial 
and monotone, by Remark 4.4 (2), V is able to prove the abstract top a for any 
program, namely, for any P € Prog, V(P,a) = t, so that V(P,a) = t = P € 7(a) 
holds. 


Let us remark a noteworthy difference of Theorem 5.4 w.r.t. Rice’ s theorem 
for static analysis. Let us consider a trivial abstract domain A = {T} with 
y(T) = Prog. Here, the trivially sound analyser AP.T is also precise, in accor- 
dance with Theorem 5.3. Instead, the trivially sound verifier Vp = \(P,a).? is 
not precise, because P € y(T) = V2(P, T) = t does not hold. The point here is 
that V? lacks the property of being nontrivial, and therefore Theorem 5.4 cannot 
be applied. On the other hand, V = A(P, a).t is nontrivial and precise, because, 
in this case, P € 7(T) & Vi(P, T) = t holds. Similarly, if we consider the trivial 
abstract domain A’ = {T, T’}, with q(T) = Prog = q(T”), then the verifier 


is sound and nontrivial, but still V’ is not precise, because P € 7(T’) © 
V'(P, T’) = t does not hold. The point here is that V’ is not monotone, because 
V'(P,T) = t and T <, T’ but V'(P,T’) # t, so that Theorem 5.4 cannot be 
applied. 


6 Comparing Analysers and Verifiers 
Let us now focus on a model for comparing the relative precision of program 
analysers and verifiers w.r.t. a common abstract domain (A, y, <y). 


Definition 6.1 (Comparison Relations). Let V,V’ € V4, A,A’ € Aa, and 
X, V € VAUAA. 


(1) VE V iff VP € Prog .Va € A. V' (P,a) =t > V(P,a)=t 
(2) AEA iff YP € Prog. A(P) <, A'(P) 

(3) VE Aiff YP € Prog Vac A. A(P) <a => V(P,a)=t 
(4) AT V iff YP € Prog. Vac A. V(P,a) =t > A(P) <,a 
(5) YV when XEY and VEX 


Let us comment on the previous definitions, which intuitively take into 
account the relative “verification powers” of verifiers and analysers. The rela- 
tion V C V’ holds when every assertion proved by V’ can be also proved by V, 
while A C A’ means that the output assertion provided by A is more precise 
than that produced by A’. Also, a verifier V is more precise than an analyser 
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A when the verification power of V is not less than the verification power of 
A, namely, any assertion a which can be proved by A for a program P, i.e. 
A(P) <, a holds, can be also proved by V. Likewise, A is more precise than 
VY when any assertion a proved by V can be also proved by A, i.e., V(P,a) =t 
implies A(P) <, a. 

Let us observe that (V4, E) turns out to be a poset, while (Ay, E) is just a 
preordered set (cf. the lattice of abstract interpretations in [8]). We have that 
(Va, E) has a greatest element Vz = \(P,a).?, which, in particular, is always 
sound although it is trivial. On the other hand, if A includes a top element T 
then Ar + AP.T is a sound analyser which is a maximal element in (A4, E). 
Also, V = V’ means that V = V’ as total functions, while A & A’ means that 
yo A= yo A. Moreover, the comparison relation E is transitive even when 
considering analysers and verifiers together: if VE A and AL VY’ then VEY, 
and if AE Vand VC A’ then AC A’. Also, the relation E shifts soundness 
from verifiers to analysers, and from analysers to verifiers as follows (due to lack 
of space the proof is omitted). 


Lemma 6.2. Let V € V4 and A E€ Ag. If V is sound and V E A then A is 
sound; if A is sound and ACY then V is sound. 


As expected, any sound analyser can be used to refine a given sound verifier 
(cf. [19,20,24,25]) and this can be formalized and proved in our framework as 
follows. 


Lemma 6.3. Given A € Ay and V € V4 which are both sound, let 


t if A(P) Sya 


ra(V)(P,a) ê ie a) f A(P) fra 


Then, TA(V) € Va is sound, TA(V) EV and t4(V) =V & VEA. 


Proof. ta(V) € Va is sound because both A and V are sound. If V(P,a) = t 
then 7,4(V)(P,a) = t, i.e., TA(V) E V. Moreover, TA(V) = V iff A(P) <} a > 
V(P,a) =t iff VEA 


6.1 Optimal and Best Analysers and Verifiers 


It makes sense to define optimality by restricting to sound analysers and verifiers 


only. Optimality is defined as minimality w.r.t. the precision relation CE, while 
being the best analyser /verifier means to be the most precise. 


Definition 6.4 (Optimal and Best Analysers and Verifiers). A sound 
analyser A € A4 is optimal if for any sound A’ € Ay, A E A > A’ & A, while 
A is a best analyser if for any sound A’ € Ay, ALC Æ. 

A sound verifier V € Vy is optimal if for any V’ € Vy, V' E V > V SV, 
while V is the best verifier if for any V’ € V4, V E V’. 
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Let us first observe that if a best analyser or verifier exists then this is unique, 
while for analysers if A, and Az are two best analysers on A then A; & Ag holds. 
Of course, the possibility of defining an optimal/best analyser or verifier depends 
on the abstract domain A. For example, for a variable sign domain such as 
{Z<o,Z>0,Z} just optimal analysers and verifiers could be defined, because for 
approximating the set {0} two optimal sound abstract values are available rather 
than a best sound abstract value. Here, the expected but interesting property 
to remark is that the notion of precise (i.e., sound and complete) analyser turns 
out to coincide with the notion of being the best analyser. 


Lemma 6.5. Let A € A4 be sound. Then, A is precise iff A is a best analyser. 


Proof. (=) Consider any sound A’ € Aa. Assume, by contradiction, that 
A Z A’, namely, there exists some P € Prog such that 7(A(P)) Z y(A'(P)). 
By soundness of A’, |P] € 7(A’(P)), so that, by precision of A, y(A(P)) € 
4(A’(P)), which is a contradiction. Thus, A E A’ holds. This means that A is a 
best analyser on A. 

(<) We have to prove that for any P € Prog and a € A, [P] € y(a) > 
y(A(P)) C y(a). Assume, by contradiction, that there exist Q € Prog and b € A 
such that [Q] € y(b) and 7(A(Q)) Z 7(b). Then, we define A’ : Prog — A as 
follows: 


py a JAP) EPER 
P = 
eE] f if P=Q 
It turns out that A’ is a total recursive function because P = Q is decidable. 
Moreover, A’ is sound: assume that 7(A’(P)) C y(a); if P # Q then A' (P) = 
A(P) so that y(A(P)) C y(a), and, by soundness of A, [P] € yla); if P= Q 
then A'(Q) = b so that (b) = y(A'(Q)) = y(A(P)) E (a), hence, [Q] € yb) 


implies [Q] € y(a). Since A is a best analyser on A, we have that A E A’, so 
that y(A(Q)) E 7(A'(Q)) = (b), which is a contradiction. 


We therefore derive the following consequence of Rice’s Theorem 5.3 for static 
analysis: the best analyser on an extensional abstract domain A exists if and only 
if A is trivial. This fact formalizes in our model the common intuition that, given 
any abstract domain, the best static analyser (where best means for any input 
program) cannot be defined due to Rice’s Theorem. An analogous result can be 
given for verifiers. 


Lemma 6.6. Let V € V4 be sound. Then V is precise iff V is the best verifier 
on A. 


Proof. Assume that V is precise and V’ € V, be sound. If V'(P,a) = t then, by 
soundness of V’, [P] € y(a), and in turn, by completeness of V, V(P,a) = t, 
thus proving that V C V’. On the other hand, assume that V is the best verifier 
on A. Assume, by contradiction, that V is not complete, namely that there exist 
some Q € Prog and b € A such that [Q] € 7(6) and V(Q, b) = ?. We then define 
Vv’: Prog x A — {t,?} as follows: 
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via af fae Vo 

V(P,a) otherwise 

Then, V’ is a total recursive function because P = Q and a = b are decidable. 
Also, V’ is sound because [Q] € 7(b) and V is sound. Since V is the best verifier, 
we have that V E V’, so that V’(Q,b) = t implies V(Q,b) = t, which is a 
contradiction. 


Thus, similarly to static analysis, as a consequence of Rice’s Theorem 5.4 for 
verification, the best nontrivial and monotone verifier on an extensional abstract 
domain A exists if and only if A is trivial, which is a common belief in program 
verification. Let us also remark that best abstract program semantics, rather 
than program analysers, do exist for nontrivial domains (see e.g. [6]). Clearly, this 
is not in contradiction with Theorem 5.3 since these abstract program semantics 
are not total recursive functions, i.e., they are not program analysers. 


7 Reducing Verification to Analysis and Back 


As usual in computability and complexity, our comparison between verification 
and analysis is made through a many-one reduction, namely by reducing a ver- 
ification problem into an analysis problem and vice versa. The minimal require- 
ment is that these reduction functions are total recursive. Moreover, we require 
that the reduction function does not depend upon a fixed abstract domain. This 
allows us to be problem agnostic and to prove a reduction for all possible ver- 
ifiers and analysers. Program verification and analysis are therefore equivalent 
problems whenever we can reduce one to the other. In the following, we prove 
that while it is always possible to transform a program analyser into an equiv- 
alent program verifier, the converse does not hold in general, but it can always 
be done for finite abstract domains. 


7.1 Reducing Verification to Analysis 


Theorem 7.1. Let (A,y,<,) be any given abstract domain. There exists a 
transform o : Aa — Va such that: 


(1) o is a total recursive function such that for all A € Ay, o(A) S A; 

(2) if A € Ag is sound then o(A) is sound; 

(3) o is monotonic; 
) 


(4) ofA So AS ASN. 
Proof. Given A € Aa, we define o(A) : Prog xA — {t,?} as follows: 


(1) Since A is a total recursive function and <, is decidable, we have that 
o(A) is a total recursive function, namely o(.A) € Va, and ø is a total recursive 
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function as well. Since, by definition, o(A)(P,a) = t = A(P) <, a, we have 
that o(A) = A. (2) By Lemma6.2, if A is sound then the equivalent verifier 
o(A) is sound as well. (3) It turns out that ø is monotonic: if A E A’ then 
(AVR oe) =t & AP) <, a > AP) <, A'(P) <, a & o(A)(P,a) = t, so 
that o(A) E o(A’) holds. (4) Assume that o(A) & o(A’), hence, for any P € 
Prog, o(A)(P, A(P)) = o(A’)(P, A(P)), namely, A(P) <, A(P) & A'(P) <; 
A(P), so that A'(P) <, A(P) holds. On the other hand, A(P) <, A'(P) can 
be dually obtained, therefore y(A(P)) = y(A'(P)) holds, namely A S A’. 


Sa 


Intuitively, Theorem 7.1 shows that program verification on a given abstract 
domain A can always and unconditionally be reduced to program analysis on 
A. This means that a solution to the program analysis problem on A, i.e. the 
definition of an analyser A, can constructively be transformed into a solution 
to the program verification problem on the same domain A, i.e. the design of a 
verifier ¢(A) which is equivalent to A. The proof of Theorem 7.1 provides this 
constructive transform ø, which is defined as expected: an analyser A on any 
(possibly infinite) abstract domain A can be used as a verifier for any assertion 
a € A simply by checking whether A(P) <, a holds or not. 


7.2 Reducing Analysis to Verification 


It turns out that the converse of Theorem 7.1 does not hold, namely a program 
analysis problem in general cannot be reduced to a verification problem. Instead, 
this reduction can be always done for finite abstract domains. Given a verifier 
V € Vu, for any program P € Prog, let us define V4(P) £ {a € A | V(P,a) = t}, 
namely, V_(P) is the set of assertions proved by V for P. Also, given an assertion 
a € A, we define fa = {a’ € A | a <, a'} as the set of assertions weaker than a. 
The following result provides a useful characterization of the equivalence between 
verifiers and analysers. 


Lemma 7.2. Let (A, y, <) be an abstract domain, A € Aa and V € Va. Then, 
ASV if and only if for any P € Prog, Vi(P) = TA(P). 


Proof. By Definition 6.1, it turns out that AC V iff for any P, Vi(P) C TA(P), 
while we have that V C A iff for any P, tA(P) C Vi (P). Thus, A S V if and 
only if for any P € Prog, V4(P) = TA(P). 


A consequence of Lemma 7.2 is that, given V € Vy, V can be transformed 
into an equivalent analyser T(V) € Aa if and only if for any program P, an 
assertion ap € A exists such that V4(P) = fap. In this case, one can then define 
T(V)(P) £ ap. 


Lemma 7.3. Let (A,7,<y) be an abstract domain and V € V4. If A € Ag is 
such that A S V then: (1) A # Ø; (2) V is not trivial; (3) V is monotone. 


Proof. (1) We observed just after Definition 4.1 that no analyser can be defined 
on the empty abstract domain. (2) If V is trivial then there exists a program 
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Q € Prog such that for any a € A, V(Q,a) = ?, so that if V S A for some A € Ay 
then, from V E A we would derive V(Q,A(Q)) = t, which is a contradiction. 
(3) Assume that V is not monotone. Then, there exist Q € Prog and a,a’ € A 
such that a € V4(Q), a <, a’ but a’ € Vi(Q). If V S A, for some A € Aa, 
then, by Lemma 7.2, V+(Q) =TA(Q), so that we would have that a € TA(Q) but 
a’ ¢ TA(Q), which is a contradiction. 


We also observe that even for a nontrivial and monotone verifier V € V4 ona 
finite abstract domain A, it is not guaranteed that an equivalent analyser exists. 
In fact, if an equivalent analyser A exists then, by Lemma 7.2, for any program 
P, Vi(P) must contain the least element, namely for any program P it must be 
the case that there exists a strongest assertion proved by V for P. 


Example 7.4. Consider a sign domain such as S £ {Z<o,Z>0,Z} where 
Z<o < 1 Zand Zso <, Z. For a program such as Q = x := 0, a sound veri- 
fier V € Vg could be able to prove all the assertions in S, namely V4(Q) = S. 
However, there exists no assertion ag € S such that V+(Q) = Tag. Hence, by 
Lemma 7.2, there exists no analyser in Ag which is equivalent to V. Also, if 
gis a ee ee A so that S’ is a meet-semilattice, and V’ € Vg is a 
sound verifier such that V,(Q) = S \ {Z=0}, still, by Lemma 7.2, there exists 
no analyser in Ag: which is equivalent to V’. 


Definition 7.5. A verifier V € V4 is finitely meet-closed when for any P € Prog 
and a,a1,a2 E€ A, if V(P,a,) = t = V(P,a2) and (a) = y(a1) N y(az2) then 
V(P,a) = t. The following notation will be used: for any domain A, 


Vi = {V € Va | V is nontrivial, monotone and finitely meet-closed}. 


Thus, finitely meet-closed verifiers can prove logical conjunctions of provable 
assertions. 


Theorem 7.6 (Reduction for Finite Domains). Let (A,7,<¥+) be a 
nonempty finite abstract domain. There exists a transform T : vi — Ay such 
that: 


(1) 7 is a total recursive function such that for all V € Vi, T TV) SV; 
(2) if V € V} is sound then T(V) is sound; 

(3) T is monotonic; 

(4) VSV. 


Proof. (1) Let A = {a1,... an} be any enumeration of A, with n > 1. Given 
V € VI, we define 7(V) : Prog —> A as follows: 


r := undef; 
forall i € 1..n do 

if (a; E€ Ve(P) A (r = undef V a; <,r)) then r := a;; 
output r 


r(V)(P) = 
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Then, it turns out that 7 is a total recursive function. Since VY is a total recursive 
function, A is finite and <, is decidable, we have that 7(V) is a total recursive 
function, so that T(V) € A4. Since V is not trivial, for any P € Prog, Vi(P) 4 Ø. 
Also, since A is finite and V is finitely meet-closed there exists some apg E€ V¢(P) 
such that V+(P) C fax, so that 7(V)(P) outputs some value in A. Moreover, 
since V is monotone, fax, C V+(P), so that ta, = Vi(P). Thus, the above pro- 
cedure defining 7(V)(P) finds and outputs apg. Hence, for any P € Prog and 
a € A, V(P,a) = t & a E€ V(P) & a E fa, & a, Sy a & T(V)(P) <, a, that 
is, T(V) = V holds. 

(2) By Lemma6.2, if V is sound then the equivalent analyser 7(V) is sound as 
well. 

(3) It turns out that 7 is monotonic: if V E V’ then, by definition, Vį(P) C 
Vi (P), so that, since Vk(P) = tr(V)(P) and Vi(P) = t7(V’)(P), we obtain 
T(V)(P) <, 7(V’)(P), namely T(V) E T(V’) holds. 

(4) Assume that T(V) = 7(V’). Hence, for any P € Prog, y(7(V)(P)) = 
y(7(V')(P)), so that, since Vi(P) = f7(V)(P) and V,(P) = T7(V’)(P), we 
obtain V4(P) = V;,(P), namely V = V’. 


An example of this reduction of verification to static analysis for finite 
domains is dataflow analysis as model checking shown in [31] (excluding Kil- 
dall’s constant propagation domain [16]). Let us now focus on infinite domains 
of assertions. 


Lemma 7.7. There exists a denumerable infinite abstract domain (A, y,<y) 
and a verifier V € Vi such that for any analyser A € Ay, A Ž V. 


Proof. Let us consider the infinite domain T = NU {T} together with the fol- 
lowing concretization function: y(T) = Prog and, for any n € N, 


y(n) = {P € Prog | P on input 0 converges in n or fewer steps} 


where the number of steps is determined by a small-step operational semantics 
=, as recalled in Sect. 2. Thus, we have that if n,m € N then n <y m if n <y m, 
while n <, T. We define a function V : Prog x T — {t,?} as follows: 


t ifa=T 
V(P,a) 4t ifa= n and P on input 0 converges in n or fewer steps 
? if a=nand P on input 0 does not converge in n or fewer steps 


Clearly, for any number n € N, the predicate “P on input 0 converges in n or 
fewer steps” is decidable, where the input 0 could be replaced by any other (finite 
set of) input value(s). Hence, V turns out to be a total recursive function, that is, 
a verifier on the abstract domain T. In particular, let us remark that V is a sound 
verifier. Moreover, V is nontrivial, since, for any P € Prog, V(P,T) = t, and 
monotone because if V(P, n) = t and n <, a then either a= T and V(P,T)=t 
or a = m, so that n <y m and therefore V(P, m) = t. Clearly, V is also finitely 
meet-closed, because if V(P,a1) = t = V(P,a2) and (a) = y(a1) N y(a2) then 
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either a = a, or a = az, so that V(P,a) = t. Summing up, it turns out that 
V € Vi. Assume now, by contradiction, that there exists an analyser A € Ar 
such that A = V. By Lemma 7.2, for any P € Prog, we have that V,(P) = 7A(P). 
Hence, if P on input 0 diverges then V4 (P) = {T} so that A(P) = T, while if P 
on input 0 converges in exactly n steps then V4(P) = {m EN| m> n}u{T}, 
so A(P) = n, namely A goes as follows: 


T if P on input 0 diverges 


n if P on input 0 converges in exactly n steps 


Since A is a total recursive function, we would have defined an algorithm A for 
deciding if a program P € Prog on input 0 terminates or not. Since Prog is 
assumed to be Turing complete with respect to the operational semantics =>, 
this leads to a contradiction. 


As a straight consequence of Lemma 7.7, the following theorem proves that 
for any infinite abstract domain A, no reduction from verifiers in vi to equivalent 
analysers in A4 is possible. 


Theorem 7.8 (Impossibility of the Reduction for Infinite Domains). 
For any denumerable infinite abstract domain (A,7,<y), there exists no function 
T: Vi — Aa such that T is a total recursive function and for all V € Vi, 
T(V) SV. 


Proof. Assume, by contradiction, that 7 : vi — Az, is a total recursive function 
such that for all V € V4, T(V) € Aa and 7(V) S V. Then, for the infinite domain 
A and verifier V € vi provided by Lemma 7.7, we would be able to construct an 
analyser T(V) € Ay such that T(V) S V, which would be in contradiction with 
Lemma 7.7. 


Intuitively, this result states that given any infinite abstract domain A, no 
general algorithm exists for constructively designing out of a reasonable (i.e., 
nontrivial, monotone and finitely meet-closed) verifier V on A an equivalent 
analyser on the same domain A. This can be read as a precise statement proving 
the folklore belief that “program analysis is harder than verification”, at least 
for infinite domains of program assertions. It is important to remark that the 
verifier V € vi on the infinite domain A defined by the proof of Lemma7.7 is 
sound. Thus, even if we restrict the reduction transform r : Vi*°"™4 — Asourd 
of Theorem 7.8 to be applied to sound verifiers—so that by Lemma 6.2 the range 
would be the sound analysers in A4—the same proof of Lemma 7.7 could still 
be used for proving that such transform 7 cannot exist. 

A further consequence of Theorem7.8 is the fact proved in [10] that 
abstract interpretation-based program analysis with infinite domains and widen- 
ing/narrowing operators is strictly more powerful than with finite domains. 
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8 Conclusion and Future Work 


We put forward a general model for studying static program analysers and veri- 
fiers from a computability perspective. This allowed us to state and prove, with 
simple arguments borrowed from standard computability theory, that for infi- 
nite abstract domains of program assertions, program analysis is a harder prob- 
lem than program verification. This is, to the best of our knowledge, the first 
formalization and proof of this popular belief, which also includes the relation- 
ship between type inference and type checking. We think that this foundational 
model can be extended to study further properties of program analysers and 
verifiers. In particular, this opens interesting perspectives in reasoning about 
program analysis and verification in a more abstract way towards a theory of 
computation that may include approximate methods, such as program analysers 
and verifiers, as objects of investigation, as suggested in [5,14]. For instance, the 
precision of program analysis and program verification, as well as their computa- 
tional complexity, are intensional program properties. Intensionally different but 
extensionally equivalent programs may exhibit completely different behaviours 
when analysed or verified. In this perspective, new intensional versions of Rice’s 
Theorem can be stated for program analysis, similarly to what is known for 
Blum’s complexity in [2]. Also, new models for reasoning about the space and 
time complexities of program analysis and verification algorithms can be stud- 
ied, especially for defining a notion of complexity class of program analysers and 
verifiers. 
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Abstract. The problem of quantitative inclusion formalizes the goal of 
comparing quantitative dimensions between systems such as worst-case 
execution time, resource consumption, and the like. Such systems are 
typically represented by formalisms such as weighted logics or weighted 
automata. Despite its significance in analyzing the quality of computing 
systems, the study of quantitative inclusion has mostly been conducted 
from a theoretical standpoint. In this work, we conduct the first empiri- 
cal study of quantitative inclusion for discounted-sum weighted automata 
(DS-inclusion, in short). 

Currently, two contrasting approaches for DS-inclusion exist: the 
linear-programming based DetLP and the purely automata-theoretic 
BCV. Theoretical complexity of DetLP is exponential in time and space 
while of BCV is PSPACE-complete. All practical implementations of BCV, 
however, are also exponential in time and space. Hence, it is not clear 
which of the two algorithms renders a superior implementation. 

In this work we present the first implementations of these algorithms, 
and perform extensive experimentation to compare between the two 
approaches. Our empirical analysis shows how the two approaches com- 
plement each other. This is a nuanced picture that is much richer than 
the one obtained from the theoretical study alone. 


1 Introduction 


The analysis of quantitative dimensions of systems, such as worst-case execution 
time, energy consumption, and the like, has been studied thoroughly in recent 
times. By and large, these investigations have tended to be purely theoretical. 
While some efforts in this space [12,13] do deliver prototype tools, the area 
lacks a thorough empirical understanding of the relative performance of different 
but related algorithmic solutions. In this paper, we further such an empirical 
understanding for quantitative inclusion for discounted-sum weighted automata. 

Weighted automata [17] are a popular choice for system models in quantita- 
tive analysis. The problem of quantitative language inclusion [15] formalizes the 
goal of determining which of any two given systems is more efficient under such 
a system model. In a discounted-sum weighted automata the value of quanti- 
tative dimensions are computed by aggregating the costs incurred during each 
step of a system execution with discounted-sum aggregation. The discounted- 
sum (DS) function relies on the intuition that costs incurred in the near future 
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are more “expensive” than costs incurred later on. Naturally, it is the choice for 
aggregation for applications in economics and game-theory [20], Markov Decision 
Processes with discounted rewards [16], quantitative safety [13], and more. 

The hardness of quantitative inclusion for nondeterministic DS-automata, 
or DS-inclusion, is evident from PSPACE-hardness of language-inclusion (LI) 
problem for nondeterministic Büchi automata [23]. Decision procedures for 
DS-inclusion were first investigated in [15], and subsequently through target 
discounted-sum [11], DS-determinization [10]. A comparator-based argument [9] 
finally established its PSPACE-completeness. However, these theoretical advances 
in DS-inclusion have not been accompanied with the development of efficient 
and scalable tools and algorithms. This is the focus of this paper; our goal is to 
develop practical algorithms and tools for DS-inclusion. 

Theoretical advances have lead to two algorithmic approaches for DS- 
inclusion. The first approach, referred to as DetLP, combines automata-theoretic 
reasoning with linear-programming (LP). This method first determinizes the 
DS-automata [10], and reduces the problem of DS-inclusion for determinis- 
tic DS-automata to LP [7,8]. Since determinization of DS-automata causes an 
exponential blow-up, DetLP yields an exponential time algorithm. An essen- 
tial feature of this approach is the separation of automata-theoretic reasoning— 
determinization—and numerical reasoning, performed by an LP-solver. Because of 
this separation, it does not seem easy to apply on-the-fly techniques to this app- 
roach and perform it using polynomial space, so this approach uses exponential 
time and space. 

In contrast, the second algorithm for DS-inclusion, referred to as BCV (after 
name of authors) is purely automata-theoretic [9]. The component of numerical 
reasoning between costs of executions is handled by a special Büchi automaton, 
called the comparator, that enables an on-line comparison of the discounted- 
sum of a pair of weight-sequences. Aided by the comparator, BCV reduces 
DS-inclusion to language-equivalence between Biichi automata. Since language- 
equivalence is in PSPACE, BCV is a polynomial-space algorithm. 

While the complexity-theoretic argument may seem to suggest a clear advan- 
tage for the pure automata-theoretic approach of BCV, the perspective from an 
implementation point of view is more nuanced. BCV relies on Ll-solvers as its 
key algorithmic component. The polynomial-space approach for LI relies on Sav- 
itch’s Theorem, which proves the equivalence between deterministic and non- 
deterministic space complexity [21]. This theorem, however, does not yield a 
practical algorithm. Existing efficient Ll-solvers [3,4] are based on Ramsey-based 
inclusion testing [6] or rank-based approaches [18]. These tools actually use expo- 
nential time and space. In fact, the exponential blow-up of Ramsey-based app- 
roach seems to be worse than that of DS-determinization. Thus, the theoretical 
advantage BCV seems to evaporate upon close examination. Thus, it is far from 
clear which algorithmic approach is superior. To resolve this issue, we provide in 
this paper the first implementations for both algorithms and perform exhaustive 
empirical analysis to compare their performance. 
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Our first tool, also called DetLP, implements its namesake algorithm as it is. 
We rely on existing LP-solver GLPSOL to perform numerical reasoning. Our sec- 
ond tool, called QuIP, starts from BCV, but improves on it. The key improvement 
arises from the construction of an improved comparator with fewer states. We 
revisit the reduction to language inclusion in [9] accordingly. The new reduction 
reduces the transition-density of the inputs to the Ll-solver (Transition density 
is the ratio of transitions to states), improving the overall performance of QuIP 
since LI-solvers are known to scale better at lower transition-density inputs [19]. 

Our empirical analysis reveals that theoretical complexity does not provide a 
full picture. Despite its poorer complexity, QuIP scales significantly better than 
DetLP, although DetLP solves more benchmarks. Based on these observations, we 
propose a method for DS-inclusion that leverages the complementary strengths of 
these tools to offer a scalable tool for DS-inclusion. Our evaluation also highlights 
the limitations of both approaches, and opens directions for further research in 
improving tools for DS-inclusion. 


2 Preliminaries 


Biichi Automata. A Büchi automaton [23] is a tuple A = (S, X, 6, Init, F), 
where S is a finite set of states, X is a finite input alphabet, 6 C (S x X x S) 
is the transition relation, Init C S is the set of initial states, and F C S is the 
set of accepting states. A Biichi automaton is deterministic if for all states s and 
inputs a, |{s’|(s,a, s’) € 6}| < 1. Otherwise, it is nondeterministic. For a word 
w = ww... E XY, a run p of w is a sequence of states sgs,... satisfying: 
(1) so € Init, and (2) 7; = (si, wi, 8:41) € ô for all i. Let inf(p) denote the 
set of states that occur infinitely often in run p. A run p is an accepting run if 
inf(p) NF #0. A word w is an accepting word if it has an accepting run. 

The language L(A) of Biichi automaton A is the set of all words accepted by 
it. Büchi automata are known to be closed under set-theoretic union, intersection, 
and complementation. For Biichi automata A and B, the language-equivalence 
and language-inclusion are whether L(A) = £(B) and L(A) C L(B), resp. 

Let A = A[0], A[1],... be a natural-number sequence, d > 1 be a rational 
number. The discounted-sum of A with discount-factor dis DS(A,d) = Xo al 
For number sequences A and B, (A, B) and (A -— B) denote the sequences where 
the i-th element is (Aļi], B[i]) and Aļi] — Blt], respectively. 


Discounted-Sum Automata. A discounted-sum automaton with discount- 
factor d > 1, DS-automaton in short, is a tuple A = (M,7y), where M = 
(S, 37,6, Init, S) is a Büchi automaton, and y : 6 > N is the weight function that 
assigns a weight to each transition of automaton M. Words and runs in weighted 
w-automata are defined as they are in Biichi automata. Note that all states are 
accepting states in this definition. The weight sequence of run p = sos... of 
word w = wow... is given by wtp = nonna... where ni = Y(Si, Wi, Si+1) for 
all i. The weight of a run p is given by DS(wt,,d). For simplicity, we denote 
this by DS(p,d). The weight of a word in DS-automata is defined as wta (w) = 
sup{ DS(p,d)|p is a run of w in A}. By convention, if a word w ¢ L(A), then 
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off, 0 on.2 


on, slow, 10 


off, slow, 10 


Fig. 1. System S Fig. 2. Specification P 


wt,4(w) = 0 [15]. A DS-automata is said to be complete if from every state there 
is at least one transition on every alphabet. Formally, for all p € S and for all 
a € X, there exists q E S s.t (p,a,q) E€ 6. Arun p E€ P of word w € L(P) isa 
diminished run if there exists a run c € Q over the same word w s.t. DS (p, d) < 
DS(o,d). We abuse notation, and use w € A to mean w € L(A) for Büchi 
automaton or DS-automaton A. We limit ourselves to integer discount-factors 
only. Given DS-automata P and Q and discount-factor d > 1, the discounted- 
sum inclusion problem, denoted by P Ca Q, determines whether for all words 
w E€ X”, wtp(w) < wto(w). 


Comparator Automata. For natural number p, integer discount-factor d > 1 
and inequality relation <, the discounted-sum comparator Abe? comparator, in 
short, is a Biichi automaton that accepts (infinite) words over the alphabet 
{0,1...,4—1} x {0,1..., 4 — 1} such that a pair (A, B) of sequences is in 
L(A‘) iff DS(A,d) < DS(B,d). Closure properties of Biichi automata ensure 
that comparator exists for all inequality relations [9]. 


Motivating Example. As an example of such a problem formulation, con- 
sider the system and specification in Figs. 1 and 2, respectively [15]. Here, the 
specification P depicts the worst-case energy-consumption model for a motor, 
and the system S is a candidate implementation of the motor. Transitions in S 
and P are labeled by transition-action and transition-cost. The cost of an exe- 
cution (a sequence of actions) is given by an aggregate of the costs of transitions 
along its run (a sequence of automaton states). In non-deterministic automata, 
where each execution may have multiple runs, cost of the execution is the cost of 
the run with maximum cost. A critical question here is to check whether imple- 
mentation S is more energy-efficient than specification P. This problem can be 
framed as a problem of quantitative inclusion between S$ and P. 


3 Prior Work 


We discuss existing algorithms for DS-inclusion i.e. DetLP and BCV in detail. 


3.1 DetLP: DS-determinization and LP-based 


Boker and Henzinger studied complexity and decision-procedures for deter- 
minization of DS-automata in detail [10]. They proved that a DS-automata can 
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be determinized if it is complete, all its states are accepting states and the 
discount-factor is an integer. Under all other circumstances, DS-determinization 
may not be guaranteed. DS-determinization extends subset-construction for 
automata over finite words. Every state of the determinized DS-automata is 
represented by an |S|-tuple of numbers, where S = {q1,...q)s;} denotes the set 
of states of the original DS-automaton. The value stored in the i-th place in the 
|S|-tuple represents the “gap” or extra-cost of reaching state q; over a finite-word 
w compared to its best value so far. The crux of the argument lies in proving 
that when the DS-automata is complete and the discount-factor is an integer, the 
“gap” can take only finitely-many values, yielding finiteness of the determinized 
DS-automata, albeit exponentially larger than the original. 


Theorem 1 [10] [DS-determinization analysis]. Let A be a complete DS- 
automata with maximum weight over transitions and s number of states. DS- 
determinization of A generates a DS-automaton with at most u° states. 


Chatterjee et al. reduced P Ca Q between non-deterministic DS-automata P and 
deterministic DS-automata Q to linear-programming [7,8, 15]. First, the product 
DS-automata P x Q is constructed so that (sp,sg) = (tp,tg) is a transition 
with weight wp — wa if transition sm +, tm with weight wm is present in M, for 
M € {P,Q}. P C Q is False iff the weight of any word in P x Q is greater than 
0. Since Q is deterministic, it is sufficient to check if the maximum weight of all 
infinite paths from the initial state in P x Q is greater than 0. For discounted- 
sum, the maximum weight of paths from a given state can be determined by 
a linear-program: Each variable (one for each state) corresponds to the weight 
of paths originating in this state, and transitions decide the constraints which 
relate the values of variables (or states) on them. The objective is to maximize 
weight of variable corresponding to the initial state. 

Therefore, the DetLP method for P Cg Q is as follows: Determinize Q to 
Qp via DS-determinization method from [10], and reduce P Ca Qp to linear 
programming following [15]. Note that since determinization is possible only if 
the DS-automaton is complete, DetLP can be applied only if Q is complete. 


Lemma 1. Let P and Q be non-deterministic DS-automata with sp and sg 
number of states respectively, Tp states in P. Let the alphabet be X and maximum 
weight on transitions be u. Then P Ca Q is reduced to linear programming with 
O(sp- 2) variables and O(rp - 5S - |X|) constraints. 


Anderson and Conitzer [7] proved that this system of linear equations can be 
solved in O(m-n?) for m constraints and n variables. Therefore, 


Theorem 2 [7,15] [Complexity of DetLP]. Let P and Q be DS-automata with sp 
and sg number of states respectively, Tp states in P. Let the alphabet be X and 
mazimum weight on transitions be u. Complexity of DetLP is O(s%-Tp- ps2 ||). 
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1: Input: Weighted automata P, Q, and discount-factor d 
2: Output: True if P Ca Q, False otherwise 

3: Ê — AugmentWtAndLabel(P) 

4: Q <— AugmentWtAndLabel(Q) 

5: PxQe MakeProductSameAlpha(P .Q) 

6: u — MaxWeight(P, Q) 

7: Ate? — MakeComparator(,u, d) 

8: Dim With Witness — Intersect(P x Q, A£?) 

9: Dim — FirstProject(Dim With Witness) — 

10: return P= Dim 


Algorithm 1. BCV(P, Q, d), Is P Ca Q? 


3.2 BCV: Comparator-based approach 


The key idea behind BCV is that P Cg Q holds iff every run of P is a diminished 
run. As a result, BCV constructs an intermediate Büchi automaton Dim that 
consists of all diminished runs of P. It then checks whether Dim consists of all 
runs of P, by determining language-equivalence between Dim and an automa- 
ton Ê that consists of all runs of P. The comparator AL? “is utilized in the 
construction of Dim to compare weight of runs in P and Q. 

Strictly speaking, BCV as presented in [9], is a generic algorithm for inclu- 
sion under a general class of aggregate functions, called w-regular aggregate 
functions. Here, BCV (Algorithm 1) refers to its adaptation to DS. Procedure 
AugmentWtAndLabel separates between runs of the same word in DS-automata 
by assigning a unique transition-identity to each transition. It also appends the 
transition weight, to enable weight comparison afterwards. Specifically, it trans- 
forms DS-automaton A into Biichi automaton A, with all states as accepting, by 
converting transition rT = s > t with weight wt and unique transition-identity 


ar 5 „wt, PE a A 
l to transition 7 = s aw, t in A. Procedure MakeProductSameAlpha(P, Q) 
A A . eek NA 
takes the product of P and Q over the same word i.e., transitions sA uani ta 


np lp,na,l 
in A, for A € {P,Q}, eimen transition (sp, sq) sania el, (tp, tg) 


in P x Q. The comparator Ane is constructed with upper-bound wp that 
equals the maximum weight of transitions in P and Q, and discount-factor d. 
Intersect matches the alphabet of Px Q with A’, and intersects them. The 
resulting automaton DimWithWitness accepts word (w, wtp, idp, wtg,idg) iff 
DS(wtp,d) < DS(wtg,d). The projection of Dim With Witness on the first three 
components of P returns Dim which contains the word (w,wtp,idp) iff it isa 
diminished run in P. Finally, language-equivalence between Dim and P returns 
the answer. 

Unlike DetLP, BCV operates on incomplete DS-automata as well, and can be 
extended to DS-automata in which not all states are accepting. 
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4 QulIP: BCV-based Solver for DS-inclusion 


We investigate more closely why BCV does not lend itself to a practical imple- 
mentation for DS-inclusion (Sect. 4.1). We identify its drawbacks, and propose 
an improved algorithm QulIP as is described in Sect. 4.3. QuIP improves upon 
BCV by means of a new optimized comparator that we describe in Sect. 4.2. 


4.1 Analysis of BCV 


The proof for PSPACE-complexity of BCV relies on LI to be PSPACE. In practice, 
though, implementations of LI apply Ramsey-based inclusion testing [6], rank- 
based methods [18] etc. All of these algorithms are exponential in time and space 
in the worst case. Any implementation of BCV will have to rely on an Ll-solver. 
Therefore, in practice BCV is also exponential in time and space. In fact, we 
show that its worst-case complexity (in practice) is poorer than DetLP. 

Another reason that prevents BCV from practical implementations is that 
it does not optimize the size of intermediate automata. Specifically, we show 
that the size and transition-density of Dim, which is one of the inputs to Ll- 
solver, is very high (Transition density is the ratio of transitions to states). Both 
of these parameters are known to be deterrents to the performance of existing 
Ll-solvers [5], subsequently to BCV as well: 


Lemma 2. Let sp, sg, Sa and Tp, Tg, Ta denote the number of states and 
transitions in P, Q, and Ane, respectively. Number of states and transitions in 


Dim are O(spsasa) and O(TÈTÊTal| X|), respectively. 


Proof. It is easy to see that the number of states and transitions of Ê Q are 
the same as those of P and Q, respectively. Therefore, the number of states and 
transitions in Px Q are O(spsq) and O(TpTo), respectively. The alphabet of Px 
Q is of the form (a, wt, idı, wt2, id2) fora E€ X, wti, wtz are non-negative weights 
bounded by u and id; are unique transition-ids in P and Q respectively. The 
alphabet of comparator And is of the form (wt, wt2). To perform intersection 
of these two, the alphabet of comparator needs to be matched to that of the 
product, causing a blow-up in number of transitions in the comparator by a factor 
of |X’|-Tp-Tg. Therefore, the number of states and transitions in Dim With Witness 
and Dim is given by O(spsagsa) and O(1p76Ta|*))). 


The comparator is a non-deterministic Biichi automata with O(u?) states 
over an alphabet of size u? [9]. Since transition-density ô = |$|- |X| for non- 
deterministic Biichi automata, the transition-density of the comparator is O(j*). 
Therefore, 


Corollary 1. Let sp, sg, sa denote the number of states in P, Q, And respec- 
tively, and dp, dg and ða be their transition-densities. Number of states and 
transition-density of Dim are O(spsou?) and O(ôpõoTpPTo > u4- |5|), respec- 
tively. 
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The corollary illustrates that the transition-density of Dim is very high even for 
small inputs. The blow-up in number of transitions of Dim With Witness (hence 
Dim) occurs during alphabet-matching for Biichi automata intersection (Algo- 
rithm 1, Line 8). However, the blow-up can be avoided by performing intersection 


^ A a,np,idp,ng,id 
over a substring of the alphabet of P x Q. Specifically, if sı eee 82 


(wti,wt2) ae F A A d : 
t2 are transitions in P x Q and comparator A respectively, 


(a,np,idp,ng,idg) 
ee ey 


and ty 


then (s1,t1,7) (s2,t2, j) is a transition in the intersection iff 
np = wt; and ng = wt, where j = (t+1) mod 2 if either sı or tı is an accept- 
ing state, and 7 = i otherwise. We call intersection over substring of alphabet 
IntersectSelectAlpha. The following is easy to prove: 


Lemma 3. Let A, = Intersect(P x O42, and A, = IntersectSelectAlpha(P x 
Â, AL’). Intersect extends alphabet of AS to match the alphabet of P x Q and 


IntersectSelectAlpha selects a substring of the alphabet of P x Q as defined above. 
Then, L(A1) = L(A2). 


IntersectSelectAlpha prevents the blow-up by |X|- Tp - Tg, resulting in only 
O(TPTQTa) transitions in Dim Therefore, 


Lemma 4 [Trans. Den. in BCV]. Let dp, dg denote transition-densities of P 
and Q, resp., and u be the upper bound for comparator AM Number of states 
and transition-density of Dim are O(spsgu”) and O(dpéq : u’), respectively. 


Language-equivalence is performed via tools for language-inclusion. The most 
effective tool for language-inclusion RABIT [1] is based on Ramsay-based inclu- 
sion testing [6]. The worst-case complexity for A C B via Ramsay-based inclusion 
testing is known to be 20n?) when B has n states. Therefore, 


Theorem 3 [Practical complexity of BCV]. Let P and Q be DS-automata with 
sp, so number of states respectively, and maximum weight on transitions be u. 
Worst-case complexity for BCV for integer discount-factor d > 1 when language- 
equivalence is performed via Ramsay-based inclusion testing is 20 (sso H"), 


Recall that language-inclusion queries are Ê C Dim and Dim C Ê. Since Dim 
has many more states than P, the complexity of Ê C Dim dominates. 

Theorems 2 and 3 demonstrate that the complexity of BCV (in practice) is 
worse than DetLP. 


4.2 Baseline Automata: An Optimized Comparator 


The 20(s*) dependence of BCV on the number of states s of the comparator 
motivates us to construct a more compact comparator. Currently a comparator 
consists of O(u?) number of states for upper bound u [9]. In this section, we 
introduce the related concept of baseline automata which consists of only O(,2)- 
many states and has transition density of O(u?). 
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Definition 1 (Baseline automata). For natural number u, integer discount- 
factor d > 1 and relation R, for R € {<,>,<,>,=}, the DSbaseline automata 
BET, baseline in short, is a Büchi automaton that accepts (infinite) words over 
the alphabet {—(u—1),...,u—1} s.t. sequences V € L(BY*) iff DS(V,d) R 0. 


Semantically, a baseline automata with upper bound u, discount-factor d and 
inequality relation R is the language of all integer sequences bounded by p for 
which their discounted-sum is related to 0 by the relation R. Baseline automata 
can also be said to be related to cut-point languages [14]. 

Since DS(A, d) < DS(B,d) = DS (A — B, d) < 0, A“ accepts (A, B) iff BE? 
accepts (A — B), regularity of baseline automata follows straight-away from the 
regularity of comparator. In fact, the automaton for BET can be derived from 
AL? by transforming the alphabet from (a,b) to (a — b) along every transition. 
The first benefit of the modified alphabet is that its size is reduced from u? to 
2-y—1. In addition, it coalesces all transitions between any two states over 
alphabet (a,a + v), for all a, into one single transition over v, thereby also 
reducing transitions. However, this direct transformation results in a baseline 
with O(y?) states. We provide a construction of baseline with O(p) states only. 

The key idea behind the construction of the baseline is that the discounted- 
sum of sequence V can be treated as a number in base d i.e. DS(V,d) = 
Day vy = (V(0].V[1]V [2] ...)a. So, there exists a non-negative value C in base 
d s.t. V +C = 0 for arithmetic operations in base d. This value C can be repre- 
sented by a non-negative sequence C s.t. DS(C,d) + DS(V,d) = 0. Arithmetic 
in base d over sequences C and V result in a sequence of carry-on X such that: 


Lemma 5. Let V,C,X be the number sequences, d > 1 be a positive integer 
such that following equations holds true: 


1. When i=0, V[0] + C[0] + X[0] =0 
2. When i> 1, Vii] + Cli + X[i] = d- Xfi-1] 


Then DS(V,d) + DS(C,d) = 0. 


In the construction of the comparator, it has been proven that when A and 
B are bounded non-negative integer sequences s.t. DS'(A,d) < DS(B,d), the 
corresponding sequences C and X are also bounded integer-sequences [9]. The 
same argument transcends here: When V is a bounded integer sequence s.t. 
DS(V,d) < 0, there exists a corresponding pair of bounded integer sequence C 
and X. In fact, the bounds used for the comparator carry over to this case as 
well. Sequence C is non-negative and is bounded by uc = u- 5 since — uc is 
the minimum value of discounted-sum of V, and integer-sequence X is bounded 
by ux = 1+ 54). On combining Lemma 5 with the bounds on X and C we get: 


Lemma 6. Let V and be an integer-sequence bounded by u s.t. DS(V,d) < 0, 
and X be an integer sequence bounded by (1+ 3%), then there exists an X s.t. 
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1. When i = 0, 0 < —(X [0] + V{[0]) < u- 3% 
2. When i > 1, 0 < (d- Xẹļi — 1] — Vẹ[i] — X[i]) < u- 34 


Equations 1-2 from Lemma 6 have been obtained by expressing Cļẹi] in terms of 
X[i], X [i-1], V [i] and d, and imposing the non-negative bound of uc = u z% on 
the resulting expression. Therefore, Lemma 6 implicitly captures the conditions 
on C by expressing it only in terms of V, X and d for DS(V,d) < 0 to hold. 

In construction of the baseline automata, the values of Vẹ[i] is part of the 
alphabet, upper bound y and discount-factor d are the input parameters. The 
only unknowns are the value of X[i]. However, we know that it can take only 
finitely many values i.e. integer values |X [i]| < ux. So, we store all possible values 
of X [i] in the states. Hence, the state-space S comprises of {(x)||x| < ux} anda 
start state s. Transitions between these states are possible iff the corresponding 
a-values and alphabet v satisfy the conditions of Eqs. 1-2 from Lemma 6. There is 
a transition from start state s to state (x) on alphabet v if 0 < —(@+v) < u- sh. 
and from state (a) to state (x') on alphabet v if 0 < (d-t—v—2") < u: 74. All 
(x)-states are accepting. This completes the construction for baseline automaton 
BY", Clearly BY’ has only O(u) states. 

~ Since Biichi automata are closed under set-theoretic operations, baseline 
automata is w-regular for all other inequalities too. Moreover, baseline automata 
for all other inequalities also have O(n) states. Therefore for sake of completion, 
we extend BY" to construct BY’. For DS(V,d) < 0, DS(C,d) > 0 (implic- 
itly generated C). Since C is a non-negative sequence it is sufficient if at least 
one value of C is non-zero. Therefore, all runs are diverted to non-accepting 
states (x, L) using the same transitions until the value of c is zero, and moves 
to accepting states (x) only if it witnesses a non-zero value for c. Formally, 


Construction. Let pc = u: I < 2. pu and wy = 1+ 34. Bed = 
(S, X, ba, Init, F) 


—~ S = Mnmit JF U Si where 
Init = {s}, F = {a||2| < px}, and 
Sı ={(x,L)||z] < ux} where L is a special character, and x € Z. 
— X = {v : |v| < u} where v is an integer. 
— ôq C S x X x S is defined as follows: 
1. Transitions from start state s: 
i. (s,v, x) for all z € F s.t. 0 < —(x + v) < uc 
ii. (s,v, (x, L)) for all (x, L) € S, s.t. £ +v=0 
2. Transitions within S1: ((x, L), v, (x', L)) for all (x, L), (a’,1) € S1, if 
d-x=v+r 
3. Transitions within F: (x, v, x’) for al x,z' € Fif0<d-x—v—-a'<d 
4. Transition between S, and F: ((x, L), v, x’) for (x, L) € S1, 2 € F if 
0<d-x—-v-xr' <d 


Theorem 4 [Baseline]. The Büchi automaton constructed above is the baseline 
Bind with upper bound u, integer discount-factor d > 1 and relation <. 


The baseline automata for all inequality relations will have O() states, alphabet 
size of 2 - — 1, and transition-density of O(7). 
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1: Input: Weighted automata P, Q, and discount-factor d 
2: Output: True if P Ca Q, False otherwise 

3: Ê — AugmentWtAndLabel(P) 

4: Q — AugmentWt(Q) 

5: PxQe MakeProductSameAlpha(P, Q) 

6: A — MakeBaseline(j1, d, <) 

7: DimWith Witness — IntersectSelectAlpha(P x Q, A) 

8: Dim — ProjectOutWt( Dim With Witness) 

9: Pluto ProjectOutWt(P) 

10: return Ê ut C Dim 


Algorithm 2. QuIlP(P, Q, d), Is P Ca Q? 


4.3 QuIP: Algorithm Description 


The construction of the universal leads to an implementation-friendly QuIP from 
BCV. The core focus of QuIP is to ensure that the size of intermediate automata is 
small and they have fewer transitions to assist the Ll-solvers. Technically, QuIP 
differs from BCV by incorporating the baseline automata and an appropriate 
IntersectSelectAlpha function, rendering QuIP theoretical improvement over BCV. 
Like BCV, QuIP also determines all diminished runs of P. So, it disambiguates 
P by appending weight and a unique label to each of its transitions. Since, 
the identity of runs of Q is not important, we do not disambiguate between 
runs of Q, we only append the weight to each transition (Algorithm 2, Line 4). 
The baseline automaton is constructed for discount-factor d, maximum weight 
p along transitions in P and Q, and the inequality <. Since the alphabet of the 
baseline automata are integers between —y to u, the alphabet of the product 
PxQis adjusted accordingly. Specifically, the weight recorded along transitions 
in the product is taken to be the difference of weight in Ê to that in Q i.e. if tp : 


a,,wt,,l a2,wt ae à ad S š 
sı —=—= sz and Toit 3} t are transitions in P and Q respectively, then 


T = (81, t1 Biet wie, S2, t2) is a transition in PxQ iff a, = ag (Algorithm 2, 
g 


Line 5). In this case, IntersectSelectAlpha intersects baseline automata A and 
product Px Q only on the meee component of alphabet in Px Q. Specifically, 


(a,wty,l) 


vd 
if sı sq and tı =, tə are transitions in Px Q and comparator Ae 


respectively, then (s1, t1, t) a (s2,t2, j) is a transition in the intersection 


iff wt; = wt2, where j = (i+1) mod 2 if either sı or tı is an accepting state, 
and j = i otherwise. Automaton Dim and Ê are obtained by project out the 
weight-component from the alphabet of Êx Q and P respectively. The alphabet 
of P x Q and P are converted from (a, wt, 1) to only (a,l). It is necessary to 
project out the weight component since in PxQ they represent the difference 
of weights and in P they represent the absolute value of weight. 

Finally, the language of Dim is equated with that of P_.wt which is the 
automaton generated from P after discarding weights from transitions. However, 
it is easy to prove that Dim C Pue. Therefore, instead of language-equivalence 
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between Dim and P and, it is sufficient to check whether P u C Dim. Asa 
result, QuIP utilizes Ll-solvers as a black-box to perform this final step. 


Lemma 7 [Trans. Den. in QuIP|. Let dp, dg denote transition-densities of P 
and Q, resp., and u be the upper bound for baseline BET, Number of states and 
transition-density of Dim are O(spsou) and O(dpdqg : p°), respectively. 


Theorem 5 [Practical complexity of Qu/P]. Let P and Q be DS-automata with 
sp, sQ number of states, respectively, and mazimum weight on transitions be u. 
Worst-case complexity for QuIP for integer discount-factor d > 1 when language- 
equivalence is performed via Ramsay-based inclusion testing is 20 (sso H”), 


Theorem 5 demonstrates that while complexity of QuIP (in practice) improves 
upon BCV (in practice), it is still worse than DetLP. 


5 Experimental Evaluation 


We provide implementations of our tools QuIP and DetLP and conduct experi- 
ments on a large number of synthetically-generated benchmarks to compare their 
performance. We seek to find answers to the following questions: (1). Which tool 
has better performance, as measured by runtime, and number of benchmarks 
solved? (2). How does change in transition-density affect performance of the 
tools? (3). How dependent are our tools on their underlying solvers? 


5.1 Implementation Details 


We implement our tools QuIP and DetLP in C++, with compiler optimization 
03 enabled. We implement our own library for all Biichi-automata and DS- 
automata operations, except for language-inclusion for which we use the state- 
of-the-art Ll-solver RABIT [4] as a black-box. We enable the -fast flag in RABIT, 
and tune its JAVA-threads with Xss, Xms, Xmx set to 1 GB, 1GB and 8 GB 
respectively. We use the large-scale LP-solver GLPSOL provided by GLPK (GNU 
Linear Programming Kit) [2] inside DetLP. We did not tune GLPSOL since it 
consumes a very small percentage of total time in DetLP, as we see later in Fig. 4. 

We also employ some implementation-level optimizations. Various steps of 
QuIP and DetLP such as product, DS-determinization, baseline construction, 
involve the creation of new automaton states and transitions. We reduce their 
size by adding a new state only if it is reachable from the initial state, and a 
new transition only if it originates from such a state. 

The universal automata is constructed on the restricted alphabet of only 
those weights that appear in the product Êx Q to include only necessary tran- 
sitions. We also reduce its size with Biichi minimization tool Reduce [4]. 

Since all states of P x Q are accepting, we conduct the intersection so that 
it avoids doubling the number of product states. This can be done, since it is 
sufficient to keep track of whether words visit accepting states in the universal. 
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5.2 Benchmarks 


To the best of our knowledge, there are no standardized benchmarks for DS- 
automata. We attempted to experimented with examples that appear in research 
papers. However, these examples are too few and too small, and do not render 
an informative view of performance of the tools. Following a standard approach 
to performance evaluation of automata-theoretic tools [5,19,22], we experiment 
with our tools on randomly generated benchmarks. 


Random Weighted-Automata Generation. The parameters for our ran- 
dom weighted-automata generation procedure are the number of states N, 
transition-density 6 and upper-bound p for weight on transitions. The states are 
represented by the set {0,1,..., N — 1}. All states of the weighted-automata are 
accepting, and they have a unique initial state 0. The alphabet for all weighted- 
automata is fixed to X = {a,b}. Weight on transitions ranges from 0 to u—1. For 
our experiments we only generate complete weighted-automata. These weighted 
automata are generated only if the number of transitions |N - 6] is greater than 
N - |X], since there must be at least one transition on each alphabet from every 
state. We first complete the weighted-automata by creating a transition from 
each state on every alphabet. In this case the destination state and weight are 
chosen randomly. The remaining (V-|5'|—| N-6|)-many transitions are generated 
by selecting all parameters randomly i.e. the source and destination states from 
{0,... N — 1}, the alphabet from X, and weight on transition from {0, u — 1}. 


5.3 Design and Setup for Experimental Evaluation 


Our experiments were designed with the objective to compare DetLP and QulP. 
Due to the lack of standardized benchmarks, we conduct our experiments on 
randomly-generated benchmarks. Therefore, the parameters for P Ca Q are the 
number of states sp and sg, transition density 6, and maximum weight wt. We 
seek to find answers to the questions described at the beginning of Sect. 5. 

Each instantiation of the parameter-tuple (sp, sg, 6, wt) and a choice of tool 
between QuIP and DetLP corresponds to one experiment. In each experiment, 
the weighted-automata P and Q are randomly-generated with the parameters 
(sp, 6, wt) and (sg, 6, wt), respectively, and language-inclusion is performed by 
the chosen tool. Since all inputs are randomly-generated, each experiment is 
repeated for 50 times to obtain statistically significant data. Each experiment is 
run for a total of 1000 sec on for a single node of a high-performance cluster. 
Each node of the cluster consists of two quad-core Intel-Xeon processor running 
at 2.83 GHz, with 8GB of memory per node. The runtime of experiments that do 
not terminate within the given time limit is assigned a runtime of oo. We report 
the median of the runtime-data collected from all iterations of the experiment. 

These experiments are scaled-up by increasing the size of inputs. The worst- 
case analysis of QuIP demonstrates that it is symmetric in sp and sg, making 
the algorithm impartial to which of the two inputs is scaled (Theorem 5). On the 
other hand, complexity of DetLP is dominated by sg (Theorem 2). Therefore, 
we scale-up our experiments by increasing sg only. 
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Since DetLP is restricted to complete automata, these experiments are con- 
ducted on complete weighted automata only. We collect data on total runtime 
of each tool, the time consumed by the underlying solver, and the number of 
times each experiment terminates with the given resources. We experiment with 
sp = 10, 6 ranges between 2.5-4 in increments of 0.5 (we take lower-bound of 
2.5 since |X| = 2), wt € {4,5}, and sg ranges from 0-1500 in increments of 25, 
d = 3. These sets of experiments also suffice for testing scalability of both tools. 


5.4 Observations 


We first compare the tools based on the number of benchmarks each can solve. 
We also attempt to unravel the main cause of failure of each tool. Out of 
the 50 experiments for each parameter-value, DetLP consistently solves more 
benchmarks than QuIP for the same parameter-values (Fig. 3a—b)!. The figures 
also reveal that both tools solve more benchmarks at lower transition-density. 
The most common, in fact almost always, reason for QuIP to fail before its 
timeout was reported to be memory-overflow inside RABIT during language- 
inclusion between P u and Dim. On the other hand, the main cause of fail- 
ure of DetLP was reported to be memory overflow during DS-determinization 
and preprocessing of the determinized DS-automata before GLPSOL is invoked. 
This occurs due to the sheer size of the determinized DS-automata, which 
can very quickly become very large. These empirical observations indicate that 
the bottleneck in QuIP and DetLP may be language-inclusion and explicit DS- 
determinization, respectively. 

We investigate the above intuition by analyzing the runtime trends for both 
tools. Figure 4a plots the runtime for both tools. The plot shows that QuIP fares 
significantly better than DetLP in runtime at ô = 2.5. The plots for both the tools 
on logscale seem curved (Fig. 4a), suggesting a sub-exponential runtime complex- 
ity. These were observed at higher ô as well. However, at higher ô we observe very 
few outliers on the runtime-trend graphs of QuIP at larger inputs when just a few 
more than 50% of the runs are successful. This is expected since effectively, the 
median reports the runtime of the slower runs in these cases. Figure 4b records 
the ratio of total time spent inside RABIT and GLPSOL. The plot reveals that 
QuIP spends most of its time inside RABIT. We also observe that most memory 
consumptions in QuIP occurs inside RABIT. In contrast, GLPSOL consumes a 
negligible amount of time and memory in DetLP. Clearly, performance of QuIP 
and DetLP is dominated by RABIT and explicit DS-determinization, respectively. 
We also determined how runtime performance of tools changes with increasing 
discount-factor d. Both tools consume lesser time as d increases. 

Finally, we test for scalability of both tools. In Fig. 5a, we plot the median of 
total runtime as sg increases at ô = 2.5,3 (sp = 10, u = 4) for QuIP. We attempt 
to best-fit the data-points for each 6 with functions that are linear, quadratic 
and cubic in sg using squares of residuals method. Figure 5b does the same for 


1 Figures are best viewed online and in color. 
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Fig. 3. Number of benchmarks solved out of 50 as sg increases with sp = 10, u = 4. 
ô = 2.5 and 6 = 4 in Fig. 3a and b, respectively. 
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Fig. 4. Time trends: Fig. 4a plots total runtime as sg increases sp = 10,4 = 4, ô = 2.5. 
Figure shows median-time for each parameter-value. Figure 4b plots the ratio of time 
spent by tool inside its solver at the same parameter values. 


DetLP. We observe that QuIP and DetLP are best fit by functions that are linear 
and quadratic in sg, respectively. 


Inferences and Discussion. Our empirical analysis arrives at conclusions 
that a purely theoretical exploration would not have. First of all, we observe that 
despite having a the worse theoretical complexity, the median-time complexity of 
QulP is better than DetLP by an order of n. In theory, QuIP scales exponentially 
in sg, but only linearly in sg in runtime. Similarly, runtime of DetLP scales 
quadratically in sg. The huge margin of complexity difference emphasizes why 
solely theoretical analysis of algorithms is not sufficient. 

Earlier empirical analysis of Ll-solvers had made us aware of their dependence 
on transition-density 6. As a result, we were able to design QuIP cognizant of 
parameter 6. Therefore, its runtime dependence on 6 is not surprising. How- 
ever, our empirical analysis reveals runtime dependence of DetLP on 6. This 
is unexpected since 6 does not appear in any complexity-theoretic analysis of 
DetLP (Theorem 1). We suspect this behavior occurs because the creation of 
each transition, say on alphabet a, during DS-determinization requires the pro- 
cedure to analyze every transition on alphabet a in the original DS-automata. 
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Fig. 5. Scalability of QuIP (Fig.5a) and DetLP (Fig. 5b) at 6 = 2.5,3. Figures show 
median-time for each parameter-value. 


Higher the transition-density, more the transitions in the original DS-automata, 
hence more expensive is the creation of transitions during DS-determinization. 

We have already noted that the performance of QuIP is dominated by 
RABIT in space and time. Currently, RABIT is implemented in Java. Although 
RABIT surpasses all other Ll-solvers in overall performance, we believe it can 
be improved significantly via a more space-efficient implementation in a more 
performance-oriented language like C++. This would, in-turn, enhance QulP. 

The current implementation of DetLP utilizes the vanilla algorithm for DS- 
determinization. Since DS-determinization dominates DetLP, there is certainly 
merit in designing efficient algorithms for DS-determinization. However, we sus- 
pect this will be of limited advantage to DetLP since it will persist to incur the 
complete cost of explicit DS-determinization due to the separation of automata- 
theoretic and numeric reasoning. 

Based on our observations, we propose to extract the complementary 
strengths of both tools: First, apply QuIP with a small timeout; Since 
DetLP solves more benchmarks, apply DetLP only if QuIP fails. 


6 Concluding Remarks and Future Directions 


This paper presents the first empirical evaluation of algorithms and tools for DS- 
inclusion. We present two tools DetLP and QuIP. Our first tool DetLP is based 
on explicit DS-determinization and linear programming, and renders an expo- 
nential time and space algorithm. Our second tool QuIP improves upon a pre- 
viously known comparator-based automata-theoretic algorithm BCV by means 
of an optimized comparator construction, called universal automata. Despite its 
PSPACE-complete theoretical complexity, we note that all practical implemen- 
tations of QuIP are also exponential in time and space. 

The focus of this work is to investigate these tools in practice. In theory, 
the exponential complexity of QuIP is worse than DetLP. Our empirical evalu- 
ation reveals the opposite: The median-time complexity of QuIP is better than 
DetLP by an order of n. Specifically, QuIP scales linearly while DetLP scales 
quadratically in the size of inputs. This re-asserts the gap between theory and 
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practice, and aserts the need of better metrics for practical algorithms. Further 
emprirical analysis by scaling the right-hand side automaton will be beneficial. 

Nevertheless, DetLP consistently solves more benchmarks than QuIP. Most 
of QuIP’s experiments fail due to memory-overflow within the Ll-solver, indicat- 
ing that more space-efficient implementations of Ll-solvers would boost QulP’s 
performance. We are less optimistic about DetLP though. Our evaluation high- 
lights the impediment of explicit DS-determinization, a cost that is unavoidable 
in DetLP’s separation-of-concerns approach. This motivates future research that 
integrates automata-theoretic and numerical reasoning by perhaps combining 
implicit DS-determinzation with baseline automata-like reasoning to design an 
on-the-fly algorithm for DS-inclusion. 

Last but not the least, our empirical evaluations lead to discovering depen- 
dence of runtime of algorithms on parameters that had not featured in their 
worst-case theoretical analysis, such as the dependence of DetLP on transition- 
density. Such evaluations build deeper understanding of algorithms, and will 
hopefully serve a guiding light for theoretical and empirical investigation in- 
tandem of algorithms for quantitative analysis 
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Abstract. The design of security protocols is extremely subtle and vul- 
nerable to potentially devastating flaws. As a result, many tools and 
techniques for the automated verification of protocol designs have been 
developed. Unfortunately, these tools don’t have the ability to model and 
reason about protocols with randomization, which are becoming increas- 
ingly prevalent in systems providing privacy and anonymity guarantees. 
The security guarantees of these systems are often formulated by means 
of the indistinguishability of two protocols. In this paper, we give the 
first practical algorithms for model checking indistinguishability proper- 
ties of randomized security protocols against the powerful threat model of 
a bounded Dolev-Yao adversary. Our techniques are implemented in the 
Stochastic Protocol ANalayzer (SPAN) and evaluated on several exam- 
ples. As part of our evaluation, we conduct the first automated analysis 
of an electronic voting protocol based on the 3-ballot design. 


1 Introduction 


Security protocols are highly intricate and vulnerable to design flaws. This 
has led to a significant effort in the construction of tools for the auto- 
mated verification of protocol designs. In order to make automation feasi- 
ble [8,12,15,23,34,48,55], the analysis is often carried out in the Dolev-Yao 
threat model [30], where the assumption of perfect cryptography is made. In the 
Dolev-Yao model, the omnipotent adversary has the ability to read, intercept, 
modify and replay all messages on public channels, remember the communication 
history as well as non-deterministically inject its own messages into the network 
while remaining anonymous. In this model, messages are symbolic terms modulo 
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an equational theory (as opposed to bit-strings) and cryptographic operations 
are modeled via equations in the theory. 

A growing number of security protocols employ randomization to achieve pri- 
vacy and anonymity guarantees. Randomization is essential in protocols/systems 
for anonymous communication and web browsing such as Crowds [49], mix- 
networks [21], onion routers [37] and Tor [29]. It is also used in fair exchange [11, 
35], vote privacy in electronic voting [6,20,52,54] and denial of service preven- 
tion [40]. In the example below, we demonstrate how randomization is used to 
achieve privacy in electronic voting systems. 


Example 1. Consider a simple electronic voting protocol for 2 voters Alice and 
Bob, two candidates and an election authority. The protocol is as follows. Ini- 
tially, the election authority will generate two private tokens tą and tg and 
send them to Alice and Bob encrypted under their respective public keys. These 
tokens will be used by the voters as proofs of their eligibility. After receiving 
a token, each voter sends his/her choice to the election authority along with 
the proof of eligibility encrypted under the public key of the election authority. 
Once all votes have been collected, the election authority tosses a fair private 
coin. The order in which Alice and Bob’s votes are published depends on the 
result of this coin toss. Vote privacy demands that an adversary not be able to 
deduce how each voter voted. 


All the existing Dolev-Yao analysis tools are fundamentally limited to proto- 
cols that are purely non-deterministic, where non-determinism models concur- 
rency as well as the interaction between protocol participants and their envi- 
ronment. There are currently no analysis tools that can faithfully reason about 
protocols like those in Example 1, a limitation that has long been identified by 
the verification community. In the context of electronic voting protocols, [28] 
identifies three main classes of techniques for achieving vote privacy; blind sig- 
nature schemes, homomorphic encryption and randomization. There the authors 
concede that protocols based on the latter technique are “hard to address with 
our methods that are purely non-deterministic.” Catherine Meadows, in her 
summary of the over 30 year history of formal techniques in cryptographic pro- 
tocol analysis [46,47], identified the development of formal analysis techniques 
for anonymous communication systems, almost exclusively built using primitives 
with randomization, as a fundamental and still largely unsolved challenge. She 
writes, “it turned out to be difficult to develop formal models and analyses of 
large-scale anonymous communication. The main stumbling block is the threat 
model”. 

In this work, we take a major step towards overcoming this long-standing 
challenge and introduce the first techniques for automated Dolev-Yao anal- 
ysis of randomized security protocols. In particular, we propose two algo- 
rithms for determining indistinguishability of randomized security protocols and 
implemented them in the Stochastic Protocol ANalyzer (SPAN). Several works 
(7,9, 28,32,41] have identified indistinguishability as the natural mechanism to 
model security guarantees such as anonymity, unlinkability, and privacy. Con- 
sider the protocol from Example 1, designed to preserve vote privacy. Such a 
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property holds if the executions of the protocol in which Alice votes for candidate 
1 and Bob votes for candidate 2 cannot be distinguished from the executions of 
the protocol in which Alice votes for candidate 2 and Bob votes for candidate 1. 

Observe that in Example 1, it is crucial that the result of the election author- 
ity’s coin toss is not visible to the adversary. Indeed if the adversary is allowed to 
“observe” the results of private coin tosses, then the analysis may reveal “secu- 
rity flaws” in correct security protocols (see examples in [13,17,19,22,36]). Thus, 
many authors [10,13,17-19,22,26,36] have proposed that randomized protocols 
be analyzed with respect to adversaries that are forced to schedule the same 
action in any two protocol executions that are indistinguishable to them. 

For randomized security protocols, [10, 18,53] have proposed that trace equiv- 
alence from the applied z-calculus [5] serve as the indistinguishability relation 
on traces. In this framework, the protocol semantics are described by partially 
observable Markov decision processes (POMDPs) where the adversary’s actions 
are modeled non-deterministically. The adversary is required to choose its next 
action based on the partial information that it can observe about the execution 
thus far. This allows us to model the privacy of coin tosses. Two security pro- 
tocols are said to be indistinguishable [18,53] if their semantic descriptions as 
POMDPs are indistinguishable. Two POMDPs M and M’ are said to be indis- 
tinguishable if for any adversary A and trace 6, the probability of the executions 
that generate the trace 6 with respect to A are the same for both M and M’. 

Our algorithms for indistinguishability in randomized security protocols are 
built on top of techniques for solving indistinguishability in finite POMDPs. 
Our first result shows that indistinguishability of finite POMDPs is P-complete. 
Membership in P is established by a reduction of POMDP indistinguishability 
to equivalence in probabilistic finite automata (PFAs), which is known to be P- 
complete [31,45,57]. Further, we show that the hardness result continues to hold 
for acyclic POMDPs. An acyclic POMDP is a POMDP that has a set of “final” 
absorbing states and the only cycles in the underlying graph are self-loops on 
these states. 

For acyclic finite POMDPs, we present another algorithm for checking indis- 
tinguishability based on the technique of translating a POMDP M into a fully 
observable Markov decision process (MDP), known as the belief MDP B(M) of 
M. It was shown in [14] that two POMDPs are indistinguishable if and only if 
the belief MDPs they induce are bisimilar as labeled Markov decision processes. 
When M is acylic and finite then its belief MDP B(M) is finite and acyclic and 
its bisimulation relation can be checked recursively. 

Protocols in SPAN are described by a finite set of roles (agents) that interact 
asynchronously by passing messages. Each role models an agent in a protocol 
session and hence we only consider bounded number of sessions. An action in 
a role performs either a message input, or a message output or a test on mes- 
sages. The adversary schedules the order in which these actions are executed and 
generates input recipes comprised of public information and messages previously 
output by the agents. In general, there are an unbounded number of input recipes 
available at each input step, resulting in POMDPs that are infinitely branching. 
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SPAN, however, searches for bounded attacks by bounding the size of attacker 
messages. Under this assumption, protocols give rise to finite acyclic POMDPs. 
Even with this assumption, protocols specified in SPAN describe POMDPs that 
are exponentially larger than their description. Nevertheless, we show that when 
considering protocols defined over subterm convergent equational theories, indis- 
tinguishability of randomized security protocols is in PSPACE for bounded 
Dolev-Yao adversaries. We further show that the problem is harder than #SATp 
and hence it is both NP-hard and coNP-hard. 

The main engine of SPAN translates a randomized security protocol into 
an acyclic finite POMDP by recursively unrolling all protocol executions and 
grouping states according to those that are indistinguishable. We implemented 
two algorithms for checking indistinguishability in SPAN. The first algorithm, 
called the PFA algorithm, checks indistinguishability of P and P’ by converting 
them to corresponding PFAs A and A’ as in the proof of decidability of indis- 
tinguishability of finite POMDPs. PFA equivalence can then be solved through 
a reduction to linear programming [31]. The second algorithm, called the on- 
the-fly (OTF) algorithm, is based on the technique of checking bisimulation of 
belief MDPs. Although asymptotically less efficient than the PFA algorithm, 
the recursive procedure for checking bisimulation in belief MDPs can be embed- 
ded into the main engine of SPAN with little overhead, allowing one to analyze 
indistinguishability on-the-fly as the POMDP models are constructed. 

In our evaluation of the indistinguishability algorithms in SPAN, we conduct 
the first automated Dolev-Yao analysis for several new classes of security pro- 
tocols including dinning cryptographers networks [38], mix networks [21] and a 
3-ballot electronic voting protocol [54]. The analysis of the 3-ballot protocol, in 
particular, demonstrates that our techniques can push symbolic protocol verifi- 
cation to new frontiers. The protocol is a full scale, real world example, which to 
the best of our knowledge, hasn’t been analyzed using any existing probabilistic 
model checker or protocol analysis tool. 


Summary of Contributions. We showed that the problem of checking indis- 
tinguishability of POMDPs is P-complete. The indistinguishability problem for 
bounded instances of randomized security protocols over subterm convergent 
equational theories (bounded number of sessions and bounded adversarial non- 
determinism) is shown to be in PSPACE and #SATp-hard. We proposed and 
implemented two algorithms in the SPAN protocol analysis tool for deciding 
indistinguishability in bounded instances of randomized security protocols and 
compare their performance on several examples. Using SPAN, we conducted the 
first automated verification of a 3-ballot electronic voting protocol. 


Related Work. As alluded to above, techniques for analyzing security protocols 
have remained largely disjoint from techniques for analyzing systems with ran- 
domization. Using probabilistic model checkers such as PRISM [44], STORM 
[27] and APEX [42] some have attempted to verify protocols that explicitly 
employ randomization [56]. These ad-hoc techniques fail to capture powerful 
threat models, such as a Dolev-Yao adversary, and don’t provide a general ver- 
ification framework. Other works in the Dolev-Yao framework [28,43] simply 
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abstract away essential protocol components that utilize randomization, such as 
anonymous channels. The first formal framework combining Dolev-Yao analysis 
with randomization appeared in [10], where the authors studied the conditions 
under which security properties of randomized protocols are preserved by pro- 
tocol composition. In [53], the results were extended to indistinguishability. 

Complexity-theoretic results on verifying secrecy and indistinguishabil- 
ity properties of bounded sessions of randomized security protocols against 
unbounded Dolev-Yao adverasries were studied in [18]. There the authors con- 
sidered protocols with a fixed equational theory! and no negative tests (else 
branches). Both secrecy and indistinguishability were shown to be in coONEXP- 
TIME, with secrecy being coONEXPTIME-hard. The analogous problems for 
purely non-deterministic protocols are known to be coNP-complete [25,33,51]. 
When one fixes, a priori, the number of coin tosses, secrecy and indistinguisha- 
bility in randomized protocols again become coNP-complete. In our asymptotic 
complexity results and in the SPAN tool, we consider a general class of equational 
theories and protocols that allow negative tests. 


2 Preliminaries 


We assume that the reader is familiar with probability distributions. For a set 
X, Dist(X) shall denote the set of all discrete distributions u on X such that 
u(x) is a rational number for each x € X. For x € X, 6, will denote the Dirac 
distribution, i.e., the measure pu such that u(x) = 1. The support of a discrete 
distribution u, denoted support( u), is the set of all elements x such that u(x) Æ 0. 


Markov Decision Processes (MDPs). MDPs are used to model processes 
that exhibit both probabilistic and non-deterministic behavior. An MDP M 
is a tuple (Z, zs, Act, A) where Z is a countable set of states, zs € Z is the 
initial state, Act is a countable set of actions and A: Z x Act — Dist(Z) is the 
probabilistic transition function. M is said to be finite if the sets Z and Act are 
finite. An execution of an MDP is a sequence p = 2 —> a —> ++: “> 2m 
such that zo = zs and 2:41 € support(A(z;,a;+41)) for all ¿ € {0,...,m—1}. The 
measure of p, denoted prob,,(p), is Iio A(zi, Q&i+1)(zi+1). For the execution 
p, we write last(p) = zm and say that the length of p, denoted |p|, is m. The set 
of all executions of M is denoted as Exec(M). 


Partially Observable Markov Decision Processes (POMDPs). A 
POMDP M is a tuple (Z, zs, Act, A, O, obs) where Mo = (Z, zs, Act, A) is an 
MDP, O is a countable set of observations and obs : Z — O is a labeling of states 
with observations. M is said to be finite if Mo is finite. The set of executions of 
Mo is taken to be the set of executions of M, i.e., we define Exec(M) as the set 


a1 a2 am 


Exec(Mo). Given an execution p = zo zı vee Zm of M, the trace of 


1 The operations considered are pairing, hashing, encryption and decryption. 
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pis tr(p) = obs(zo )aobs(z1)a2g ++: Amobs(Zm). For a POMDP M and a sequence 
6 € O-(Act-O)*, the probability of 6 in M, written prob,,(6), is the sum of the 
measures of executions in Exec(M) with trace 6. Given two POMDPs Mo and 
My, with the same set of actions Act and the same set of observations O, we say 
that Mo and My, are distinguishable if there exists 0 € O - (Act - ©)* such that 
proby,,(0) A proby,, (0). If Mo and M, cannot be distinguished, they are said 
to be indistinguishable. We write Mo ~ My, if Mo and M; are indistinguish- 
able. As is the case in [18,53], indistinguishability can also be defined through a 
notion of an adversary. Our formulation is equivalent, even when the adversary 
is allowed to toss coins [18]. 


Probabilistic Finite Automata (PFAs). A PFA is like a finite-state deter- 
ministic automaton except that the transition function from a state on a given 
input is described as a probability distribution. Formally, a PFA A is a tuple 
(Q,©',qs, A, F) where Q is a finite set of states, X is a finite input alphabet, 
qs E€ Q is the initial state, A : Q x X — Dist(Q) is the transition relation 
and F C Q is the set of accepting states. A run p of A on an input word 
u E X* = ajag---dm is a sequence gogi:::dm E Q* such that go = qs and 
qi E€ support(A(q;-1,a;)) for each 1 < i < m. For the run p on word u, its 
measure, denoted proba ,,(p), is JJ; A(gi-1, ai) (qi). The run p is called accept- 
ing if qm € F. The probability of accepting a word u € X, written proba (u), 
is the sum of the measures of the accepting runs on u. Two PFAs Ao and A; 
with the same input alphabet X are said to be equivalent, denoted Ap = Aj, if 
proba, (u) = proba, (u) for all input words u € &*. 


3 POMDP Indistinguishability 


In this section, we study the underlying semantic objects of randomized security 
protocols, POMDPs. The techniques we develop for analyzing POMDPs provide 
the foundation for the indistinguishability algorithms we implement in the SPAN 
protocol analysis tool. Our first result shows that indistinguishability of finite 
POMDPs is decidable in polynomial time by a reduction to PFA equivalence, 
which is known to be decidable in polynomial time [31,57]. 


Proposition 1. Indistinguishability of finite POMDPs is in P. 
Proof (sketch). Consider two POMDPs M; = (Zi, z$, Act, A;,O,obs;) for i € 


{0,1} with the same set of actions Act and the set of observations O. We shall 
construct PFAs Ap and A; such that Mg ~ My, iff Ap = Aj as follows. For 


i € {0,1}, let “bad,” be a new state and define the PFA A; = (Qi, X, ¢&, Aj, Fi) 
where Q; = Zi U {bad;}, X = Act x O, qi = zt, Fi = Zi and A; is defined as 
follows. 

Ai(q,a)(q') if q,q' € Zi and obs(q) =o 

1 if q € Zi, obs(q) 4 o and q’ = bad; 

1 if q,q' = bad; 

0 otherwise 


Ai(q, (a, 0))(q') = 
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Let u = (a1, 00)... (ak, Ok—1) be a non-empty word on X. For the word u, 
let 0, be the trace 09a101@2:--Q@,—10~—1. The proposition follows immediately 
from the observation that prob, (wu) = prob m, (Ou). 


An MDP M = (Z, zs, Act, A) is said to be acyclic if there is a set of absorbing 
states Zab; C Z such that for all a € Act and z € Zaps, A(z, a)(z) = 1 and for all 
p= z0 SEs ens Be Exec(M) if z; = zj fori Æ j then z; € Zaps. Intuitively, 
acyclic MDPs are MDPs that have a set of “final” absorbing states and the 
only cycles in the underlying graph are self-loops on these states. A POMDP 
M = (Z, zs, Act, A, O, obs) is acyclic if the MDP Mo = (Z, zs, Act, A) is acyclic. 
We have the following result, which can be shown from the P-hardness of the 
PFA equivalence problem [45]. 


Proposition 2. Indistinguishability of finite acyclic POMDPs is P-hard. Hence 
Indistinguishability of finite POMDPs is P-complete. 


Thanks to Proposition 1, we can check indistinguishability for finite POMDPs 
by reducing it to PFA equivalence. We now present a new algorithm for indis- 
tinguishability of finite acyclic POMDPs. A well-known POMDP analysis tech- 
nique is to translate a POMDP M into a fully observable belief MDP B(M) 
that emulates it. One can then analyze B(M) to infer properties of M. The 
states of B(M) are probability distributions over the states of M. Further, 
given a state b € B(M), if states 2, z2 of M are such that b(z1),b(z2) are 
non-zero then zı and z2 must have the same observation. Hence, by abuse of 
notation, we can define obs(b) to be obs(z) if b(z) # 0. Intuitively, an execution 


p = bo > by 2> --- 2% bm of B(M) corresponds to the set of all executions 
p of M such that tr(p’) = obs(bp)a,obs(by )az +++ &amobs(bm). The measure of 
execution p in B(M) is exactly prob, (obs(bp)a obs(b1 )az +++ amobs(bm)). 

The initial state of B(M) is the distribution that assigns 1 to the initial 
state of M. Intuitively, on a given state b € Dist(M) and an action a, there 
is at most one successor state b%° for each observation o. The probability of 
transitioning from b to b™° is the probability that o is observed given that the 
distribution on the states of M is b and action a is performed; b™°(z) is the 
conditional probability that the actual state of the POMDP is z. The formal 
definition follows. 


Definition 1. Let M = (Z, zs, Act, A,O,obs) be a POMDP. The belief MDP 
of M, denoted B(M), is the tuple (Dist(Z),52,, Act, A?) where A® is defined as 
follows. For b € Dist(Z), action a € Act and o € O, let 


Prao= 2 H2):( E aea): 


zEZ z!€Z Nobs(z’)=o 


A®(b,a) is the unique distribution such that for each o € O, if Pya,o #0 then 
AB (b,a) (bœ?) = Pao where for all z' € Z, 


Lez b(z)-A(z,a)(2’) . P\ tens 
Boz") = if obs(z ) =o . 


Pb,a,o 
0 otherwise 
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Let M; = (Zi, 24, Act, A;, O, obs;) for i € {0,1} be POMDPs with the same 
set of actions and observations. In [14] the authors show that Mo and Mı 
are indistinguishable if and only if the beliefs 6,0 and 6,1 are strongly belief 
bisimilar. Strong belief bisimilarity coincides with the notion of bisimilarity of 
labeled MDPs: a pair of states (bo, b1) € Dist(Z) x Dist(Z1) is said to be strongly 
belief bisimilar if (i) obs(bo) = obs(b1), (ii) for all a € Act, o € O, Diy .0,0 = Ph: ,a,o 
and (iii) the pair (b9’°, b}'°) is strongly belief bisimilar if py,,a,0 = Pby,0,0 > O. 
Observe that, in general, belief MDPs are defined over an infinite state space. It 
is easy to see that, for a finite acyclic POMDP M, B(M) is acyclic and has a 
finite number of reachable belief states. Let Moy and M1 be as above and assume 
further that Mo, M, are finite and acyclic with absorbing states Zaps C Zo U Z1. 
As a consequence of the result from [14] and the observations above, we can 
determine if two states (bp, b1) € Dist(Zo) x Dist(Z,) are strongly belief bisimilar 
using the on-the-fly procedure from Algorithm 1. 


Algorithm 1. On-the-fly bisimulation for finite acyclic POMDPs 


1: function BISIMILAR(beliefState bo, beliefState b1) 

2 if obs(bo) 4 obs(bi) then return false 

3 if support(bo) U support(bi) C Zaps then return true 

4 for a € Act do 

5: for o € O do 

6: if poo,0,0 £ Pb;,a,0 then return false 

7 if Peo,a,o > 0 and !BIsIMILAR(b,’°, b7?) then return false 
8 return true 


4 Randomized Security Protocols 


We now present our core process calculus for modeling security protocols with 
coin tosses. The calculus closely resembles the ones from [10,53]. First proposed 
in [39], it extends the applied z-calculus [5] by the inclusion of a new opera- 
tor for probabilistic choice. As in the applied z-calculus, the calculus assumes 
that messages are terms in a first-order signature identified up-to an equational 
theory. 


4.1 Terms, Equational Theories and Frames 


A signature F contains a finite set of function symbols, each with an associated 
arity. We assume F contains two special disjoint sets, Mpub and Npriv, of O-ary 
symbols.” The elements of Mpub are called public names and represent public 
nonces that can be used by the Dolev-Yao adversary. The elements of Norn are 


2 As we assume F is finite, we allow only a fixed number of public nonces are available 
to the adversary. 
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called names and represent secret nonces and secret keys. We also assume a set 
of variables that are partitioned into two disjoint sets ¥ and Xw. The variables 
in X are called protocol variables and are used as placeholders for messages input 
by protocol participants. The variables in Y, are called frame variables and are 
used to point to messages received by the Dolev-Yao adversary. Terms are built 
by the application of function symbols to variables and terms in the standard 
way. Given a signature F and Y C X U Xw, we use 7 (F, YV) to denote the set of 
terms built over F and VY. The set of variables occurring in a term u is denoted 
by vars(u). A ground term is a term that contains no free variables. 

A substitution ø is a partial function with a finite domain that maps vari- 
ables to terms. dom(c) will denote the domain and ran(c) will denote the 
range. For a substitution o with dom(c) = {21,...,v%}, we denote o as 
{x1 > o(z1),...,£k + o(ay)}. A substitution ø is said to be ground if every 
term in ran(c) is ground and a substitution with an empty domain will be 
denoted as Ø. Substitutions can be applied to terms in the usual way and we 
write uo for the term obtained by applying the substitution o to the term u. 

Our process algebra is parameterized by an equational theory (F, E), where 
E is a set of F-Equations. By an F-Equation, we mean a pair u = v where 
u,v E T(F \ Npriv, X) are terms that do not contain private names. We will 
assume that the equations of (F, E) can be oriented to produce a convergent 
rewrite system. Two terms u and v are said to be equal with respect to an 
equational theory (F, E), denoted u =p v, if E F u = v in the first order 
theory of equality. We often identify an equational theory (F, E) by E when the 
signature is clear from the context. 

In the calculus, all communication is mediated through an adversary: all 
outputs first go to an adversary and all inputs are provided by the adver- 
sary. Hence, processes are executed in an environment that consists of a frame 
yp: Xw > T(F,9) and a ground substitution o : X — T(F,0). Intuitively, 
y represents the sequence of messages an adversary has received from protocol 
participants and o records the binding of the protocol variables to actual input 
messages. An adversary is limited to sending only those messages that it can 
deduce from the messages it has received thus far. Formally, a term u € T(F, 0) 
is deducible from a frame ọ with recipe r € T(F \ Npriv, dom(y)) in equational 
theory E, denoted y Fp u, if rp =p u. We will often omit r and E and write 
yk u if they are clear from the context. 

We now recall an equivalence on frames, called static equivalence [5]. Intu- 
itively, two frames are statically equivalent if the adversary cannot distinguish 
them by performing tests. The tests consists of checking whether two recipes 
deduce the same term. Formally, two frames yı and 2 are said to be statically 
equivalent in equational theory E, denoted yı =p Y2, if dom(y,) = dom(y2) 
and for all r1,r2 E€ T(F \ Norv, Xw) we have r191 =f r291 iff rive =E T292. 


4.2 Process Syntax 


Processes in our calculus are the parallel composition of roles. Intuitively, a role 
models a single actor in a single session of the protocol. Syntactically, a role is 
derived from the grammar: 
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R:=0 | in(«)‘ | out(uo R +p wi - RY | ite([cr A... A cr], R, R)! | (R- R) 


where p is a rational number in the unit interval [0,1], 4 € N, x € X, uo, u1 E 
T(F,&) and c; is u; = v; with u,v; E€ T(F, X) for all i € {1,..., k}. The 
constructs in(x)*, out(uo - R +p u1 - R)® and ite([c1 A... A cp], R, R)* are said 
to be labeled operations and £ € N is said to be their label. The role 0 does 
nothing. The role in(x)’ reads a term u from the public channel and binds it to 
x. The role out(up- R +p u1- R’)’ outputs the term uo on the public channel 
and becomes R with probability p and it outputs the term u, and becomes R’ 
with probability 1 — p. A test [c1 A... A cx] is said to pass if for all 1 < i < k, 
the equality c; holds. The conditional role ite([c1 A... A cp], R, R’)* becomes R if 
[c1 \...Acy] passes and otherwise it becomes R’. The role R- R’ is the sequential 
composition of role R followed by role R’. The set of variables of a role R is the 
set of variables occurring in R. The construct in(z)*- R binds variable x in R. 
The set of free and bound variables in a role can be defined in the standard way. 
We will assume that the set of free variables and bound variables of a role are 
disjoint and that a bound variable is bound only once in a role. A role R is said 
to be well-formed if every labeled operation occurring in R has the same label 
£; the label £ is said to be the label of the well-formed role R. 

A process is the parallel composition of a finite set of roles Rı,..., Rn, 
denoted Rı | ... | Ry. We will use P and Q to denote processes. A process 
R, |... | Rn is said to be well-formed if each role is well-formed, the sets of 
variables of R; and Rj are disjoint for i Æ j, and the labels of roles R; and Rj 
are different for i 4 j. For the remainder of this paper, processes are assumed to 
be well-formed. The set of free (resp. bound) variables of P is the union of the 
sets of free (resp. bound) variables of its roles. P is said to be ground if the set 
of its free variables is empty. We shall omit labels when they are not relevant in 
a particular context. 


Example 2. We model the electronic voting protocol from Example1l in our 
formalism. The protocol is built over the equational theory with signature 
F = {sk/1, pk/1, aenc/3, adec/2, pair/2, fst/1,snd/1} and the equations 


E = {adec(aenc(m, r, pk(k)), sk(k)) = m, 


fst(pair(m1,m2)) = mı, snd(pair(m1,m2)) = mg}. 


The function sk (resp. pk) is used to generate a secret (resp. public) key from 
a nonce. For generation of their pubic key pairs, Alice, Bob and the election 
authority hold private names k4, kp and kga, respectively. The candidates will 
be modeled using public names cp and cı and the tokens will be modeled using 
private names t4 and tg. Additionally, we will write y; and r; for i € N to denote 
fresh input variables and private names, respectively. The roles of Alice, Bob and 
the election authority are as follows. 
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A(ca) := in(yo) - out(aenc(pair(adec(yo, sk(k.4)), ca), ro, pk(kea))) 

B(cg) := in(y1) - out(aenc(pair(adec(y1, sk(KB)), cB), 171, pk(kea))) 

EA := out(aenc(t4,r2, pk(k4))) - out(aenc(tg, r3, pk(ke))) - in(y3) - in(ys) - 
ite([fst(adec(y3, sk(kga))) = ta A fst(adec(ys, sk(kpa))) = a 
out(pair(snd(adec(y3, sk(kma))), snd(adec(y4, sk(kga)))) + 

pair(snd(adec(y4, sk(kga))), snd(adec(y3, sk(kea))))), 


In roles above, we write out(uo) as shorthand for out(uo -0 +1 uo - 0). The 
entire protocol is gae a cp) = A(ca) | B(cp) | EA. 


4.3 Process Semantics 


An extended process is a 3-tuple (P,y,o) where P is a process, y is a frame 
and o is a ground substitution whose domain contains the free variables of P. 
We will write € to denote the set of all extended processes. Semantically, a 
ground process P with n roles is a POMDP [P] = (Z, zs, Act, A, O, obs), where 
Z = EU {error}, zs is (PO,0), Act = (T(F \ Naiv, ¥w) U {7, } XA nh), 
A is a function that maps an extended process and an action to a distribution 
on E, O is the set of equivalence classes on frames over the static equivalence 
relation =p and obs is as follows. Let [p] denote the equivalence class of y with 
respect to =z. Define obs to be the function such that for any extended process 
n = (P,¢,c), obs(7) = [y]. We now give some additional notation needed for 
the definition of A. Given a measure p on E and role R we define u- R to be 
the distribution pı on E such that wi(P’,y,c) = u(P,¢,¢) if u(P.y,c) > 0 
and P’ is P- R and 0 otherwise. Given a measure u on E£ and a process Q, we 
define u | Q to be the distribution yz, on E such that p4(P’,y,c) = uP, 9,0) 
if u(P,y,c) > 0 and P’ is P | Q and 0 otherwise. The distribution Q | u 
is defined analogously. For distributions 41,42 over E€ and a rational number 
p € [0,1], the convex combination p1 + u2 is the distribution u on € such that 
u(n) = p- m(n) + (1 — p) - u(n) for all n € E. The definition of A is given in 
Fig. 1, where we write (P, 9,0) > u if A((P, 9,0), a) = u. If A((P, 9,0), a) is 
undefined in Fig. 1 then A((P, 9,0), a) = derror. Note that A is well-defined, as 
roles are deterministic. 


4.4 Indistinguishability in Randomized Cryptographic Protocols 


Protocols P and P’ are said to indistinguishable if [P] ~ [P']. Many interesting 
properties of randomized security protocols can be specified using indistinguisha- 
bility. For example, consider the simple electronic voting protocol from Exam- 
ple 2. We say that the protocol satisfies the vote privacy property if evote(co, c1) 
and evote(c1, co) are indistinguishable. 

In the remainder of this section, we study the problem of deciding when 
two protocols are indistinguishable by a bounded Dolev-Yao adversary. We 
restrict our attention to indistinguishability of protocols over subterm conver- 
gent equational theories [4]. Before presenting our results, we give some rele- 
vant definitions. (F, E) is said to be subterm convergent if for every equation 
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r€T(F\Npiv, tu) pr u «x ¢dom(c) N 


, 0) 
(in(a)*, 9,0) ——> 8(0,p,cufe+u}) 


i = |dom(y)| +1 pj =U {wee = ujo} for j € {0,1} 


(7,£) 
(out (uo < Ro +p u1: Ri), p,0) = (Ro, 20,0) +5 6(Ry,91,0) 


OUT 


Vi € {1,...,k}, ci is u; = vi and uo =p vio 


CONDIF 
T£ 
(ite([er A... A cr], R, R’), p, 0) > 62.0) 


Ji € {1,...,k}, ci is u; = v; and uo Æp vio 


CONDELSE 
TL 
(ite([er A... Ack], R, R), p, 0) > Sce o.o) 


R#£0 (R,y,o) 2 R, 9,0) + 
T A a, E ii 
(R: R',9,o) ou R (0. R, p,a) > pw 


a 1 a 
(Q,p,o) >u BRR: (Q',9,0) >u ie. 


(Q|Q',¢,0) > u| Q’ (Q1Q',9,0) >Q| uh 


Fig. 1. Process semantics 


u = v € E oriented as a rewrite rule u — v, either v is a proper subterm of u 
or v is a public name. A term u can be represented as a directed acyclic graph 
(dag), denoted dag(u) [4,51]. Every node in dag(u) is a function symbol, name 
or a variable. Nodes labeled by names and variables have out-degree 0. A node 
labeled with a function symbol f has out-degree equal to the arity of f where 
outgoing edges of the node are labeled from 1 to the arity of f. Every node 
of dag(w) represents a unique sub-term of u. The depth of a term u, denoted 
depth(w), is the length of the longest simple path from the root in dag(w). Given 
an action a, depth(a) = 0 if a = (7,7) and depth(a) = m if a = (r,j) and 
depth(r) = m. 

Let P be a protocol such that |P] = (Z, zs, Act, A, O, obs). Define [P]q to be 
the POMDP (Z, zs, Actg, A, O, obs) where Acta C Act is such that every a € Act 
has depth(a) < d. For a constant d € N, we define InDist(d) to be the decision 
problem that, given a subterm convergent theory (F, E) and protocols P and P’ 
over (F, E), determines if [P]a and [P’]a are indistinguishable. We assume that 
the arity of the function symbols in F is given in unary. We have the following. 


Theorem 1. For any constant d € N, InDist(d) is in PSPACE. 


We now show InDist(d) is both NP-hard and coNP-hard by showing a reduc- 
tion from #SATp to InDist(d). #SATbp is the decision problem that, given a 3CNF 
formula ¢ and a constant k € N, checks if the number of satisfying assignments 
of ¢ is equal to k. 
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Theorem 2. There is ado E€ N such that #SATp reduces to InDist(d) in logspace 
for every d > do. Thus, InDist(d) is NP-hard and coNP-hard for every d > do. 


5 Implementation and Evaluation 


Using (the proof of) Proposition 1, we can solve the indistinguishability prob- 
lem for randomized security protocols as follows. For protocols P, P’, translate 
[P], [P’] into PFAs A,A’ and determine if A = A’ using the linear program- 
ming algorithm from [31]. We will henceforth refer to this approach as the PFA 
algorithm and the approach from Algorithm 1 as the OTF algorithm. We have 
implemented both the PFA and OTF algorithms as part of Stochastic Protocol 
ANalayzer (SPAN), which is a Java based tool for analyzing randomized security 
protocols. The tool is available for download at [1]. The main engine of SPAN 
translates a protocol into a POMDP, belief MDP or PFA by exploring all proto- 
col executions and grouping equivalent states using an engine, Kiss [4] or AKISS 
[16], for static equivalence. KISS is guaranteed to terminate for subterm conver- 
gent theories and AKISS provides support for XOR while considering a slighly 
larger class of equational theories called optimally reducing. Operations from 
rewriting logic are provided by queries to Maude [24] and support for arbitrary 
precision numbers is given by Apfloat [2]. Our experiments were conducted on 
an Intel core i7 dual quad core processor at 2.67 GHz with 12Gb of RAM. The 
host operating system was 64 bit Ubuntu 16.04.3 LTS. 

Our comparison of the PFA and OTF algorithms began by examining how 
each approach scaled on a variety of examples (detailed at the end of this section). 
The results of the analysis are given in Fig. 2. For each example, we consider a 
fixed recipe depth and report the running times for 2 parties as well as the 
maximum number of parties for which one of the algorithms terminates within 
the timeout bound of 60min. On small examples for which the protocols were 
indistinguishable, we found that the OTF and PFA algorithms were roughly 
equivalent. On large examples where the protocols were indistinguishable, such 
as the 3 ballot protocol, the PFA algorithm did not scale as well as the OTF 
algorithm. In particular, an out-of-memory exception often occurred during con- 
struction of the automata or the linear programming constraints. On examples 
for which the protocols were distinguishable, the OTF algorithm demonstrated a 
significant advantage. This was a result of the fact that the OTF approach ana- 
lyzed the model as it was constructed. If at any point during model construction 
the bisimulation relation was determined not to hold, model construction was 
halted. By contrast, the PFA algorithm required the entire model to be con- 
structed and stored before any analysis could take place. 

In addition to stress-testing the tool, we also examined how each algorithm 
performed under various parameters of the mix-network example. The results are 
given in Fig. 3, where we examine how running times are affected by scaling the 
number of protocol participants and the recipe depth. Our results coincided with 
the observations from above. One interesting observation is that the number of 
beliefs explored on the 5 party example was identical for recipe depth 4 and recipe 
depth 10. The reason is that, for a given protocol input step, SPAN generates a 
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1 2 3 4 5 6 7 8 9 10 
PROTOCOL|PARTIES| DEPTH| EQUIV TIME (s) STATES| BELIEFS 
PFA OTF 
Kiss |AKiSs} Kiss |AKIss 

DC-net 2 10 true | n/s | 5.5 | n/s 4 58 24 
DC-net 3 10 true | n/s |OOM] n/s | 3013] n/a 286 
mix-net 2 10 false | TO | TO 3 A n/a T 
mix-net 5 10 false | OOM| OOM | 582 |1586 | n/a | 79654 
Evote 2 10 true 1 1 .5 1 34 33 
Evote 8 10 true | 105 | 105 | 131 | 124 94 93 
3 Ballot 2 10 true | n/s [OOM | n/s | 1444] n/a 408 


Fig. 2. Experimental Results: Columns 1 and 2 describe the example being analyzed. 
Column 3 gives the maximum recipe depth and column 4 indicates when the example 
protocols were indistinguishable. Columns 5-8 give the running time (in seconds) for 
the respective algorithms and static equivalence engines. We report OOM for an out 
of memory exception and TO for a timeout - which occurs if no solution is generated 
in 60min. Column 9 gives the number of states in the protocol’s POMDP and Column 
10 gives the number of belief states explored in the OTF algorithm. When information 
could not be determined due to a failure of the tool to terminate, we report n/a. For 
protocols using equational theories that were not subterm convergent, we write n/s 
(not supported) for the Kiss engine. 


minimal set of recipes. This is in the sense that if recipes ro, rı are generated at 
an input step with frame y, then roy Æp ri. For the given number of public 
names available to the protocol, changing the recipe depth from 4 to 10 did not 
alter the number of unique terms that could be constructed by the attacker. We 
conclude this section by describing our benchmark examples, which are available 
at [3]. Evote is the simple electronic voting protocol derived from Example 2 and 
the DC-net, mix-net and 3 ballot protocols are described below. 


Dinning Cryptographers Networks. In a simple DC-net protocol [38], two parties 
Alice and Bob want to anonymously publish two confidential bits m4 and mp, 
respectively. To achieve this, Alice and Bob agree on three private random bits 
bo, bı and be and output a pair of messages according to the following scheme. 
In our modeling the protocol, the private bits are generated by a trusted third 
party who communicates them with Alice and Bob using symmetric encryption. 


If bo =0 Alice: Mao = bi OMA, May = bə 
Bob: MB o = bı, Mp = b2 D mnB 
If bo =1 Alice: Mao = bi, May = bo DMA 
Bob: Mp = by OMB, Mp = bo 
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From the protocol output, the messages m, and mg can be retrieved as 
Ma o®Mpg oand M410Mzg,1. The party to which the messages belong, however, 
remains unconditionally private, provided the exchanged secrets are not revealed. 


1 2 3 4 5 6 7 8 9 


PARTIES|DEPTH|EQUIV TIME (s) STATES| BELIEFS 


2 1 true 3 3 2 3 15 12 

3 1 true 1 1.2 A 9 81 50 

4 1 true 47 47 2 6 2075 656 
5 1 true | OOM | OOM | 34 | 79 n/a 4032 
5 2 false | OOM | OOM 13 33 n/a 1382 
5 3 false | OOM | OOM | 124 | 354 | n/a 6934 
5 4 false | OOM | OOM | 580 | 1578] n/a | 79654 


Fig. 3. Detailed Experimental Results for Mix Networks: The columns have an identical 
meaning to the ones from Fig. 2. We report OOM for an out of memory exception and 
when information could not be determined due to a failure of the tool to terminate, 
we report n/a. 


Mix Networks. A mix-network [21], is a routing protocol used to break the link 
between a message’s sender and the message. This is achieved by routing mes- 
sages through a series of proxy servers, called mixes. Each mix collects a batch of 
encrypted messages, privately decrypts each message and forwards the resulting 
messages in random order. More formally, consider a sender Alice (A) who wishes 
to send a message m to Bob (B) through Mix (M). Alice prepares a cipher-text of 
the form aenc(aenc(m, n1, pk(B)), no, pk(M)) where aenc is asymmetric encryp- 
tion, no, nı are nonces and pk( M), pk(B) are the public keys of the Mix and Bob, 
respectively. Upon receiving a batch of N such cipher-texts, the Mix unwraps 
the outer layer of encryption on each message using its secret key, randomly 
permutes and forwards the messages. A passive attacker, who observes all the 
traffic but does not otherwise modify the network, cannot (with high probabil- 
ity) correlate messages entering and exiting the Mix. Unfortunately, this simple 
design, known as a threshold mix, is vulnerable to a very simple active attack. 
To expose Alice as the sender of the message aenc(m, nı, pk(B)), an attacker 
simply forwards Alice’s message along with N—1 dummy messages to the Mix. 
In this way, the attacker can distinguish which of the Mix’s N output messages 
is not a dummy message and hence must have originated from Alice. 


3-Ballot Electronic Voting. We have modeled and analyzed the 3-ballot voting 
system from [54]. To simplify the presentation of this model, we first describe 
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the major concepts behind 3-ballot voting schemes, as originally introduced by 
[50]. At the polling station, each voter is given 3 ballots at random. A ballot is 
comprised of a list of candidates and a ballot ID. When casting a vote, a voter 
begins by placing exactly one mark next to each candidate on one of the three 
ballots chosen a random. An additional mark is then placed next to the desired 
candidate on one of the ballots, again chosen at random. At the completion of 
the procedure, at least one mark should have been placed on each ballot and two 
ballots should have marks corresponding to the desired candidate. Once all of 
the votes have been cast, ballots are collected and released to a public bulletin 
board. Each voter retains a copy of one of the three ballots as a receipt, which 
can be used to verify his/her vote was counted. 

In the full protocol, a registration agent is responsible for authenticating 
voters and receiving ballots and ballot ids generated by a vote manager. Once a 
voter marks his/her set of three ballots, they are returned to the vote manager 
who forwards them to one of three vote repositories. The vote repositories store 
the ballots they receive in a random position. After all votes have been collected 
in the repositories, they are released to a bulletin board by a vote collector. 
Communication between the registration agent, vote manager, vote repositories 
and vote collector is encrypted using asymmetric encryption and authenticated 
using digital signatures. In our modeling, we assume all parties behave honestly. 


6 Conclusion 


In this paper, we have considered the problem of model checking indistinguisha- 
bility in randomized security protocols that are executed with respect to a Dolev- 
Yao adversary. We have presented two different algorithms for the indistinguisha- 
bility problem assuming bounded recipe sizes. The algorithms have been imple- 
mented in the SPAN protocol analysis tool, which has been used to verify some 
well known randomized security protocols. We propose the following as part of 
future work: (i) extension of the current algorithms as well the tool to the case of 
unbounded recipe sizes; (ii) application of the tool for checking other randomized 
protocols; (iii) giving tight upper and lower bounds for the indistinguishability 
problem for the randomized protocols. 
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Abstract. The secure information flow problem, which checks whether 
low-security outputs of a program are influenced by high-security inputs, 
has many applications in verifying security properties in programs. In 
this paper we present lazy self-composition, an approach for verifying 
secure information flow. It is based on self-composition, where two copies 
of a program are created on which a safety property is checked. However, 
rather than an eager duplication of the given program, it uses duplication 
lazily to reduce the cost of verification. This lazy self-composition is 
guided by an interplay between symbolic taint analysis on an abstract 
(single copy) model and safety verification on a refined (two copy) model. 
We propose two verification methods based on lazy self-composition. The 
first is a CEGAR-style procedure, where the abstract model associated 
with taint analysis is refined, on demand, by using a model generated 
by lazy self-composition. The second is a method based on bounded 
model checking, where taint queries are generated dynamically during 
program unrolling to guide lazy self-composition and to conclude an 
adequate bound for correctness. We have implemented these methods on 
top of the SEAHORN verification platform and our evaluations show the 
effectiveness of lazy self-composition. 


1 Introduction 


Many security properties can be cast as the problem of verifying secure informa- 
tion flow. A standard approach to verifying secure information flow is to reduce it 
to a safety verification problem on a “self-composition” of the program, i.e., two 
“copies” of the program are created [5] and analyzed. For example, to check for 
information leaks or non-interference [17], low-security (public) inputs are ini- 
tialized to identical values in the two copies of the program, while high-security 
(confidential) inputs are unconstrained and can take different values. The safety 
check ensures that in all executions of the two-copy program, the values of the 
low-security (public) outputs are identical, i.e., there is no information leak from 
confidential inputs to public outputs. The self-composition approach is useful for 
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checking general hyper-properties [11], and has been used in other applications, 
such as verifying constant-time code for security [1] and k-safety properties of 
functions like injectivity and monotonicity [32]. 

Although the self-composition reduction is sound and complete, it is chal- 
lenging in practice to check safety properties on two copies of a program. There 
have been many efforts to reduce the cost of verification on self-composed pro- 
grams, e.g., by use of type-based analysis [33], constructing product programs 
with aligned fragments [4], lockstep execution of loops [32], transforming Horn 
clause rules [14,24], etc. The underlying theme in these efforts is to make it 
easier to derive relational invariants between the two copies, e.g., by keeping 
corresponding variables in the two copies near each other. 

In this paper, we aim to improve the self-composition approach by making it 
lazier in contrast to eager duplication into two copies of a program. Specifically, 
we use symbolic taint analysis to track flow of information from high-security 
inputs to other program variables. (This is similar to dynamic taint analysis [30], 
but covers all possible inputs due to static verification.) This analysis works 
on an abstract model of a single copy of the program and employs standard 
model checking techniques for achieving precision and path sensitivity. When this 
abstraction shows a counterexample, we refine it using on-demand duplication 
of relevant parts of the program. Thus, our lazy self-composition' approach is 
guided by an interplay between symbolic taint analysis on an abstract (single 
copy) model and safety verification on a refined (two copy) model. 

We describe two distinct verification methods based on lazy self-composition. 
The first is an iterative procedure for unbounded verification based on coun- 
terexample guided abstraction refinement (CEGAR) [9]. Here, the taint analysis 
provides a sound over-approximation for secure information flow, i.e., if a low- 
security output is proved to be untainted, then it is guaranteed to not leak any 
information. However, even a path-sensitive taint analysis can sometimes lead to 
“false alarms”, i.e., a low-security output is tainted, but its value is unaffected 
by high-security inputs. For example, this can occur when a branch depends on 
a tainted variable, but the same (semantic, and not necessarily syntactic) value 
is assigned to a low-security output on both branches. Such false alarms for secu- 
rity due to taint analysis are then refined by lazily duplicating relevant parts of 
a program, and performing a safety check on the composed two-copy program. 
Furthermore, we use relational invariants derived on the latter to strengthen the 
abstraction within the iterative procedure. 

Our second method also takes a similar abstraction-refinement view, but in 
the framework of bounded model checking (BMC) [6]. Here, we dynamically gen- 
erate taint queries (in the abstract single copy model) during program unrolling, 
and use their result to simplify the duplication for selfcomposition (in the two 
copy model). Specifically, the second copy duplicates the statements (update 
logic) only if the taint query shows that the updated variable is possibly tainted. 
Furthermore, we propose a specialized early termination check for the BMC- 


1 This name is inspired by the lazy abstraction approach [20] for software model check- 
ing. 
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based method. In many secure programs, sensitive information is propagated in 
a localized context, but conditions exist that squash its propagation any further. 
We formulate the early termination check as a taint check on all live variables 
at the end of a loop body, i.e., if no live variable is tainted, then we can con- 
clude that the program is secure without further loop unrolling. (This is under 
the standard assumption that inputs are tainted in the initial state. The early 
termination check can be suitably modified if tainted inputs are allowed to occur 
later.) Since our taint analysis is precise and path-sensitive, this approach can 
be beneficial in practice by unrolling the loops past the point where all taint has 
been squashed. 

We have implemented these methods in the SEAHORN verification plat- 
form [18], which represents programs as CHC (Constrained Horn Clause) rules. 
Our prototype for taint analysis is flexible, with a fully symbolic encoding of the 
taint policy (i.e., rules for taint generation, propagation, and removal). It fully 
leverages SMT-based model checking techniques for precise taint analysis. Our 
prototypes allow rich security specifications in terms of annotations on low/high- 
security variables and locations in arrays, and predicates that allow information 
downgrading in specified contexts. 

We present an experimental evaluation on benchmark examples. Our results 
clearly show the benefits of lazy self-composition vs. eager self-composition, 
where the former is much faster and allows verification to complete in larger 
examples. Our initial motivation in proposing the two verification methods was 
that we would find examples where one or the other method is better. We expect 
that easier proofs are likely to be found by the CEGAR-based method, and eas- 
ier bugs by the BMC-based method. As it turns out, most of our benchmark 
examples are easy to handle by both methods so far. We believe that our gen- 
eral approach of lazy self-composition would be beneficial in other verification 
methods, and both our methods show its effectiveness in practice. 

To summarize, this paper makes the following contributions. 


— We present lazy self-composition, an approach to verifying secure informa- 
tion flow that reduces verification cost by exploiting the interplay between a 
path-sensitive symbolic taint analysis and safety checking on a self-composed 
program. 

— We present IFc-CEGAR, a procedure for unbounded verification of secure 
information flow based on lazy self-composition using the CEGAR paradigm. 
Irc-CEGAR starts with a taint analysis abstraction of information flow and 
iteratively refines this abstraction using self-composition. It is tailored toward 
proving that programs have secure information flow. 

— We present IFC-BMC, a procedure for bounded verification of secure informa- 
tion flow. As the program is being unrolled, Irc- BMC uses dynamic symbolic 
taint checks to determine which parts of the program need to be duplicated. 
This method is tailored toward bug-finding. 

— We develop prototype implementations of Irc-CEGAR and Irc-BMC and 
present an experimental evaluation of these methods on a set of benchmark- 
s/microbenchmarks. Our results demonstrate that IrFc-CEGAR and IFC- 
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int steps = 0; 

for (i = 0; i < N; i++) { zero[i] = product[i] = 0; } 
for (i = 0; i < N*W; i++) { 

int bi = bigint_extract_bit(a, i); 

if (bi == 1) { 

bigint_shiftleft(b, i, shifted_b, &steps); 
bigint_add(product, shifted_b, product, &steps); 
else { 

bigint_shiftleft(zero, i, shifted_zero, &steps); 
bigint_add(product, shifted_zero, product, &steps); 


Ree 
WR OWOONOOUBWNR 
w 


Listing 1. “BigInt” Multiplication 


BMC easily outperform an eager self-composition that uses the same backend 
verification engines. 


2 Motivating Example 


Listing 1 shows a snippet from a function that performs multiword multiplica- 
tion. The code snippet is instrumented to count the number of iterations of the 
inner loop that are executed in bigint-shiftleft and bigint_add (not shown 
for brevity). These iterations are counted in the variable steps. The security 
requirement is that steps must not depend on the secret values in the array a; 
array b is assumed to be public. 

Static analyses, including those based on security types, will conclude that 
the variable steps is “high-security.” This is because steps is assigned in a 
conditional branch that depends on the high-security variable bi. However, this 
code is in fact safe because steps is incremented by the same value in both 
branches of the conditional statement. 

Our lazy self-composition will handle this example by first using a symbolic 
taint analysis to conclude that the variable steps is tainted. It will then self- 
compose only those parts of the program related to computation of steps, and 
discover that it is set to identical values in both copies, thus proving the program 
is secure. 

Now consider the case when the code in Listing 1 is used to multiply two “big- 
ints” of differing widths, e.g., a 512b integer is multiplied with 2048b integer. 
If this occurs, the upper 1536 bits of a will all be zeros, and bi will not be a 
high-security variable for these iterations of the loop. Such a scenario can benefit 
from early-termination in our BMC-based method: our analysis will determine 
that no tainted value flows to the low security variable steps after iteration 512 
and will immediately terminate the analysis. 


3 Preliminaries 


We consider First Order Logic modulo a theory T and denote it by FOL(T). 
Given a program P, we define a safety verification problem w.r.t. P as a tran- 
sition system M = (X, Init(X), Tr(X, X’), Bad(X)) where X denotes a set of 
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(uninterpreted) constants, representing program variables; Init, Tr and Bad are 
(quantifier-free) formulas in FOL(T) representing the initial states, transition 
relation and bad states, respectively. The states of a transition system correspond 
to structures over a signature X = Xr UX. We write Tr(X, X’) to denote that 
Tr is defined over the signature Xr U X U X’, where X is used to represent the 
pre-state of a transition, and X’ = {a’|a € X} is used to represent the post-state. 

A safety verification problem is to decide whether a transition system M is 
SAFE or UNSAFE. We say that M is UNSAFE iff there exists a number N such 
that the following formula is satisfiable: 


N-1 


Init(Xo) A (A 1K. Xie) ^ Bad(Xy) (1) 
1=0 


where X; = {a;|a € X} is a copy of the program variables (uninterpreted con- 
stants) used to represent the state of the system after the execution of i steps. 

When M is UNSAFE and sy € Bad is reachable, the path from so € Init to 
sy is called a countererample (CEX). 

A transition system M is SAFE iff the transition system has no counterex- 
ample, of any length. Equivalently, M is SAFE iff there exists a formula Inv, 
called a safe inductive invariant, that satisfies: (i) Init(X) — Inv(X), (ii) 
Inv(X) A Tr(X, X’) > Inv(X’), and (iii) Inv(X) > aBad(X). 

In SAT-based model checking (e.g., based on IC3 [7] or interpolants [23, 
34]), the verification procedure maintains an inductive trace of formulas 
[Fo(X),..., Fin (X)] that satisfy: (i) Init(X) > Fo(X), (ii) F(X) A Tr(X, X’) > 
Fi41(X’) for every 0 <i < N, and (iii) F(X) —> 4Bad(X) for every 0 <i < N. 
A trace [Fo,..., Fy] is closed if 31 < i < N- F; > (V F). There is an 
obvious relationship between existence of closed traces and safety of a transition 
system: A transition system T is SAFE iff it admits a safe closed trace. Thus, 
safety verification is reduced to searching for a safe closed trace or finding a 
CEX. 


4 Information Flow Analysis 


Let P be a program over a set of program variables X. Recall that Init(X) is 
a formula describing the initial states and Tr(X, X’) a transition relation. We 
assume a “stuttering” transition relation, namely, Tr is reflexive and therefore it 
can non-deterministically either move to the next state or stay in the same state. 
Let us assume that H C X is a set of high-security variables and L := X\H is 
a set of low-security variables. 

For each x € L, let Obs,(X) be a predicate over program variables X that 
determines when variable x is adversary-observable. The precise definition of 
Obs,,(X) depends on the threat model being considered. A simple model would 
be that for each low variable x € L, Obs,(X) holds only at program completion 
— this corresponds to a threat model where the adversary can run a program that 
operates on some confidential data and observe its public (low-security) outputs 
after completion. A more sophisticated definition of Obs,(X) could consider, for 
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example, a concurrently executing adversary. Appropriate definitions of Obs, (X) 
can also model declassification [29], by setting Obs,(X) to be false in program 
states where the declassification of x is allowed. 

The information flow problem checks whether there exists an execution of 
P such that the value of variables in H affects a variable in x € L in some state 
where the predicate Obs,(X) holds. Intuitively, information flow analysis checks 
if low-security variables “leak” information about high-security variables. 

We now describe our formulations of two standard techniques that have been 
used to perform information flow analysis. The first is based on taint analy- 
sis [30], but we use a symbolic (rather than a dynamic) analysis that tracks 
taint in a path-sensitive manner over the program. The second is based on self- 
composition [5], where two copies of the program are created and a safety prop- 
erty is checked over the composed program. 


4.1 Symbolic Taint Analysis 


When using taint analysis for checking information flow, we mark high-security 
variables with a “taint” and check if this taint can propagate to low-security 
variables. The propagation of taint through program variables of P is determined 
by both assignments and the control structure of P. In order to perform precise 
taint analysis, we formulate it as a safety verification problem. For this purpose, 
for each program variable x € X, we introduce a new “taint” variable x+. Let 
X, := {x,|" E X} be the set of taint variables where x; € X; is of sort Boolean. 
Let us define a transition system M; := (Y, Init,, Tri, Bad) where Y := XU X; 
and 


Init,(Y) := Init(X) A (A a) A (A ~a) (2) 


ceH cel 
Tri(Y, Y’) := Tr(X, X^) A Ñ (Y, X!) (3) 


Bad,(Y) := (v Obss (X) A n) (4) 


zEL 


Since taint analysis tracks information flow from high-security to low-security 
variables, variables in H; are initialized to true while variables in L; are initialized 
to false. W.l.o.g., let us denote the state update for a program variable x € X 
as: x’ = cond(X) ? pi(X) : p2(X). Let y be a formula over X. We capture the 
taint of y by: 

false ifypnx =9 


Oly) = V t otherwise 
TEP 
Thus, ®(X,, X!) is defined as: A 2, = @(cond)V (cond? O(y1) : O(¢2)) 
TEX: 


Intuitively, taint may propagate from xı to xz either when x, is assigned 
an expression that involves x2 or when an assignment to xı is controlled by x2. 
The bad states (Bad+) are all states where a low-security variable is tainted and 
observable. 
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4.2 Self-composition 


When using self-composition, information flow is tracked over an execution of 
two copies of the program, P and Py. Let us denote Xa := {xa|x € X} as 
the set of program variables of Py. Similarly, let Inity(Xq) and Tra(Xa, X3) 
denote the initial states and transition relation of P4. Note that Inita and Tra 
are computed from Init and Tr by means of substitutions. Namely, substituting 
every occurrence of x € X or 2’ € X’ with aq € X4 and z4 E€ X}, respectively. 
Similarly to taint analysis, we formulate information flow over a self-composed 
program as a safety verification problem: Ma := (Z, Inita, Tra, Bada) where 
Z := X U X4 and 


Inita(Z) := Init(X) A Init(Xa) A ( N= za) (5) 
TralZ, Z^) := Tr(X, X^) A^ Tr(Xa, X1) (6) 


Bada(Z = (v Obs,(X) A Obsz(Xa) A 7(a@ = zo) (7) 


cel 


In order to track information flow, variables in Lg are initialized to be equal 
to their counterpart in L, while variables in Hg remain unconstrained. A leak 
is captured by the bad states (i.e. Bada). More precisely, there exists a leak iff 
there exists an execution of My that results in a state where Obs,(X), Obsz(Xa) 
hold and x Æ xq for a low-security variable x € L. 


5 Lazy Self-composition for Information Flow Analysis 


In this section, we introduce lazy self-composition for information flow analysis. 
It is based on an interplay between symbolic taint analysis on a single copy 
and safety verification on a self-composition, which were both described in the 
previous section. 

Recall that taint analysis is imprecise for determining secure information 
flow in the sense that it may report spurious counterexamples, namely, spurious 
leaks. In contrast, self-composition is precise, but less efficient. The fact that self 
composition requires a duplication of the program often hinders its performance. 
The main motivation for lazy self-composition is to target both efficiency and 
precision. 

Intuitively, the model for symbolic taint analysis M+ can be viewed as an 
abstraction of the self-composed model Ma, where the Boolean variables in M; 
are predicates tracking the states where x Æ xq for some x € X. This intuition 
is captured by the following statement: M, over-approzimates Ma. 
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Corollary 1. If there exists a path in Ma from Inita to Bada then there exists 
a path in M; from Init, to Badı. 


Corollary 2. If there exists no path in M, from Init, to Bad, then there exists 
no path in Ma from Inita to Bada. 


This abstraction-based view relating symbolic taint analysis and self- 
composition can be exploited in different verification methods for checking secure 
information flow. In this paper, we focus on two — a CEGAR-based method 
(Irc-CEGAR) and a BMC-based method (Irc-BMC). These methods using 
lazy self-composition are now described in detail. 


5.1 IFc-CEGAR 


We make use of the fact that M; can be viewed as an abstraction w.r.t. to Mg, and 
propose an abstraction-refinement paradigm for secure information flow analysis. 
In this setting, M; is used to find a possible counterexample, i.e., a path that 
leaks information. Then, Mg is used to check if this counterexample is spurious 
or real. In case the counterexample is found to be spurious, IFc-CEGAR uses 
the proof that shows why the counterexample is not possible in Mg to refine M;. 

A sketch of Irc-CEGAR appears in Algorithm 1. Recall that we assume that 
solving a safety verification problem is done by maintaining an inductive trace. 
We denote the traces for M; and Ma by G = [Go,...,G,] and H = [Hp,..., Hx], 
respectively. Irc-CEGAR starts by initializing M, Ma and their respective 
traces G and H (lines 1-4). The main loop of Irc-CEGAR (lines 5-18) starts 
by looking for a counterexample over M; (line 6). In case no counterexample is 
found, IFc-CEGAR declares there are no leaks and returns SAFE. 

If a counterexample m is found in M;, Irc-CEGAR first updates the trace 
of Ma, i.e. H, by rewriting G (line 10). In order to check if m is spurious, 
Irc-CEGAR creates a new safety verification problem Me, a version of Mg 
constrained by 7 (line 11) and solves it (line 12). If Me has a counterexample, 
Irc-CEGAR returns UNSAFE. Otherwise, G is updated by H (line 16) and 
M; is refined such that ~r is ruled out (line 17). 

The above gives a high-level overview of how Irc-CEGAR operates. We 
now go into more detail. More specifically, we describe the functions ReWrite, 
Constraint and Refine. We note that these functions can be designed and 
implemented in several different ways. In what follows we describe some possible 
choices. 


Proof-Based Abstraction. Let us assume that when solving M; a counterex- 
ample 7 of length k is found and an inductive trace G is computed. Following a 
proof-based abstraction approach, Constraint() uses the length of 7 to bound 
the length of possible executions in Ma by k. Intuitively, this is similar to bound- 
ing the length of the computed inductive trace over Ma. 

In case Me has a counterexample, a real leak (of length k) is found. Other- 
wise, since Me considers all possible executions of Ma of length k, Irc-CEGAR 
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Algorithm 1. IrFc-CEGAR (P,H) 
Input: A program P and a set of high-security variables H 
Output: SAFE, UNSAFE or UNKNOWN. 
M, — ConstructTaintModel(P, H) 
Ma <— ConstructSCModel(P, H) 
G — [Go = Init:] 
H — [Ho = Inita] 
repeat 
(G, Reaint, T) — MC.Solve( M+, G) 
if Riaint = SAFE then 
return SAFE 
else 
H -— ReWrite(G, H) 
Me — Constraint (Ma, 7) 
(H, Rs, Tt) — MC.Solve(M., H) 
if Rs = UNSAFE then 
| return UNSAFE 
else 
G — ReWrite(H, G) 
M; — Refine(M:, G) 


OMANOA BR wWN 


BRR RP ee eR 
Noa p oN FO 


until oo 
return UNKNOWN 


m e 
o œ 


deduces that there are no counterexamples of length k. In particular, the coun- 
terexample m is ruled out. IFC-CEGAR therefore uses this fact to refine M 
and G. 


Inductive Trace Rewriting. Consider the set of program variables X, taint 
variables X;, and self compositions variables X4. As noted above, M, over- 
approximates Mga. Intuitively, it may mark a variable x as tainted when x does 
not leak information. Equivalently, if a variable x is found to be untainted in Mi 
then it is known to also not leak information in Ma. More formally, the following 
relation holds: sx; — (x = xa). 

This gives us a procedure for rewriting a trace over M; to a trace over Ma. Let 
G = [Go,...,G,] be an inductive trace over M+. Considering the definition of 
M;,, G can be decomposed and rewritten as: G;(Y) := Gi(X)AGY(X:)AV(X, Xz). 
Namely, G(X) and G(X;) are sub-formulas of G; over only X and X; variables, 
respectively, and Y(X, X+) is the part connecting X and X+. 

Since G is an inductive trace G;(Y)A Tr:(Y, Y’) > G41 (Y’) holds. Following 
the definition of Tr, and the above decomposition of G;, the following holds: 


G(X) A Tr(X, X’) > Gia ( X’) 
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Let H = [Ho,..., Hp] be a trace w.r.t. Mg. We define the update of H by 
G as the trace H* =[Hj,..., Hý], which is defined as follows: 


Hp := Inita (8) 
Hy (Z) = Hi(Z) A G(X) A Gi(Xa) A (fe =2alGi(¥) > >2e}) 9) 


Intuitively, if a variable x € X is known to be untainted in M;, using Corol- 
lary 2 we conclude that x = xq in Ma. 

A similar update can be defined when updating a trace G w.r.t. M by a trace 
H w.r.t. Ma. In this case, we use the following relation: =(# = rq) — z. Let 
H = |Ho(Z),..., H,(Z)] be the inductive trace w.r.t. Ma. H can be decomposed 
and written as H;(Z) := H;(X) A H#(Xa) A (X, Xa). 

Due to the definition of Mq and an inductive trace, the following holds: 


Hy(X) ^ Tr(X, X') > H;(X’) 
H$ (Xa) ^ Tr(Xa, X4) + F(X 


We can therefore update a trace G = [Go,...,G,] w.r.t. Mi by defining the 
trace G* = [Gġ,..., Gi], where: 


Go := Inita (10) 
G*(Y) := Gi(Y) A Hi(X) A H(X) A (A feel Hi(Z) > (r = wa)t) (11) 


Updating G by H, and vice-versa, as described above is based on the fact 
that M, over-approximates Ma w.r.t. tainted variables (namely, Corollaries 1 and 
2). It is therefore important to note that G* in particular, does not “gain” more 
precision due to this process. 


Lemma 1. Let G be an inductive trace w.r.t. M, and H an inductive trace 
w.r.t. Ma. Then, the updated H* and G* are inductive traces w.r.t. Ma and 
M, respectively. 


Refinement. Recall that in the current scenario, a counterexample was found 
in M,, and was shown to be spurious in Ma. This fact can be used to refine both 
M and G. 

As a first step, we observe that if x = xa in Mg, then 72; should hold in 
M. However, since M; is an over-approximation it may allow x to be tainted, 
namely, allow x; to be evaluated to true. 

In order to refine M, and G, we define a strengthening procedure for G, 
which resembles the updating procedure that appears in the previous section. 
Let H = [Ho,..., Hp] be a trace w.r.t. Ma and G = [Go,...,G,] be a trace 
w.r.t. Mi, then the strengthening of G is denoted as G” = [GG,...,G] such 
that: 
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o :=Inita (12) 
GEY) :=G,(Y) A A(X) A H3(X) A ( MaslHi(Z) > ~e = wa)}) ^ 


(A{-2lHi(Z) > (x = za)}) (13) 


The above gives us a procedure for strengthening G by using H. Note that 
since M; is an over-approximation of Mj, it may allow a variable x € X to be 
tainted, while in Ma (and therefore in H), x = za. As a result, after strengthen- 
ing G” is not necessarily an inductive trace w.r.t. M;, namely, G; \ Tr; > Gipi’ 
does not necessarily hold. In order to make G” an inductive trace, M; must be 
refined. 

Let us assume that G; ^A Tr; — G{,,' does not hold. By that, Gf A Tr; A 
G7, ;' is satisfiable. Considering the way G” is strengthened, three exists x € X 
such that G; A Tr; A x, is satisfiable and Gj,, => “x. The refinement step is 
defined by: 


xv, = G; ? false : (O(cond) V (cond? O(~1) : Olp2))) 


This refinement step changes the next state function of x+ such that whenever 
G; holds, x; is forced to be false at the next time frame. 


Lemma 2. Let G” be a strengthened trace, and let MẸ be the result of refine- 
ment as defined above. Then, G” is an inductive trace w.r.t Mẹ. 


Theorem 1. Let A be a sound and complete model checking algorithm w.r.t. 
FOL(T) for some T, such that 1 maintains an inductive trace. Assuming IFC- 
CEGAR uses X, then IFC-CEGAR is both sound and complete. 


Proof (Sketch). Soundness follows directly from the soundness of taint analysis. 
For completeness, assume My is SAFE. Due to our assumption that 2l is sound 
and complete, X% emits a closed inductive trace H. Intuitively, assuming H is of 
size k, then the next state function of every taint variable in M; can be refined to 
be aconstant false after a specific number of steps. Then, H can be translated to 
a closed inductive trace G over M; by following the above presented formalism. 
Using Lemma 2 we can show that a closed inductive trace exists for the refined 
taint model. 


5.2 IFrc-BMC 


In this section we introduce a different method based on Bounded Model Check- 
ing (BMC) [6] that uses lazy self-composition for solving the information flow 
security problem. This approach is described in Algorithm 2. In addition to the 
program P, and the specification of high-security variables H, it uses an extra 
parameter BND that limits the maximum number of loop unrolls performed on 
the program P. (Alternatively, one can fall back to an unbounded verification 
method after BND is reached in BMC). 
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Algorithm 2. IFc-BMC (P,H,BND) 
Input: A program P, a set of high-security variables H, max unroll bound 
BND 
Output: SAFE, UNSAFE or UNKNOWN. 
1 i0 
2 repeat 
3 M (i) — LoopUnro11(P, i) 
4 M(i) — EncodeTaint(M (i)) 
5 TR of Ms(i) — LazySC(M(i), M:z(2)) 
6 Bad of Ms(i)— V ~ly =y’) 


yEL 
7 result — SolveSMT(M;(7)) 
8 if result = countererample then 
9 | return UNSAFE 
10 live_taint — CheckLiveTaint(M;(i)) 
11 if live_taint = false then 
12 | return SAFE 


13 i— i+1 
14 until i = BND 
15 return UNKNOWN 


Algorithm 3. Lazysc( M, M) 
Input: A program model M and the corresponding taint program model M: 
Output: Transition relation of the self-composed program Trs 

1 for each state update x — ọ in transition relation of M do 

2 add state update xz — ọ to Trs 

3 tainted — SolveSMT(query on x+ in Mz) 

4 if tainted = false then 

5 | add state update z’ — x to Trs 

6 else 

7 | add state update «’ — duplicate(y) to Trs 

8 


return Tr, 


In each iteration of the algorithm (line 2), loops in the program P are unrolled 
(line 3) to produce a loop-free program, encoded as a transition system M (i). A 
new transition system M/;(i) is created (line 4) following the method described 
in Sect. 4.1, to capture precise taint propagation in the unrolled program M (i). 
Then lazy self-composition is applied (line 5), as shown in detail in Algorithm 3, 
based on the interplay between the taint model M;(2) and the transition system 
M(t). In detail, for each variable x updated in M(i), where the state update is 
denoted x := p, we use x, in M;(2) to encode whether x is possibly tainted. We 
generate an SMT query to determine if x; is satisfiable. If it is unsatisfiable, i.e., 
x, evaluates to False, we can conclude that high security variables cannot affect 
the value of x. In this case, its duplicate variable x’ in the self-composed program 
M,(i) is set equal to x, eliminating the need to duplicate the computation that 
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will produce x’. Otherwise if z+ is satisfiable (or unknown), we duplicate y and 
update zx’ accordingly. 

The self-composed program M,(i) created by LazySC (Algorithm 3) is then 
checked by a bounded model checker, where a bad state is a state where any 
low-security output y (y € L, where L denotes the set of low-security vari- 
ables) has a different value than its duplicate variable y’ (line 6). (For ease of 
exposition, a simple definition of bad states is shown here. This can be suit- 
ably modified to account for Obs,(X) predicates described in Sect. 4.) A coun- 
terexample produced by the solver indicates a leak in the original program P. 
We also use an early termination check for BMC encoded as an SMT-based 
query CheckLiveTaint, which essentially checks whether any live variable is 
tainted (line 10). If none of the live variables is tainted, i.e., any initial taint 
from high-security inputs has been squashed, then IFc-BMC can stop unrolling 
the program any further. If no conclusive result is obtained, IFc- BMC will return 
UNKNOWN. 


6 Implementation and Experiments 


We have implemented prototypes of IFc-CEGAR and Irc-BMC for informa- 
tion flow checking. Both are implemented on top of SEAHORN [18], a software 
verification platform that encodes programs as CHC (Constrained Horn Clause) 
rules. It has a frontend based on LLVM [22] and backends to Z3 [15] and other 
solvers. Our prototype has a few limitations. First, it does not support bit- 
precise reasoning and does not support complex data structures such as lists. 
Our implementation of symbolic taint analysis is flexible in supporting any given 
taint policy (i.e., rules for taint generation, propagation, and removal). It uses 
an encoding that fully leverages SMT-based model checking techniques for pre- 
cise taint analysis. We believe this module can be independently used in other 
applications for security verification. 


6.1 Implementation Details 


Irc-CEGAR Implementation. As discussed in Sect. 5.1, the IrFc-CEGAR imple- 
mentation uses taint analysis and self-composition synergistically and is tai- 
lored toward proving that programs are secure. Both taint analysis and self- 
composition are implemented as LLVM-passes that instrument the program. 
Our prototype implementation executes these two passes interchangeably as the 
problem is being solved. The Irc-CEGAR implementation uses Z3’s CHC solver 
engine called SPACER. SPACER, and therefore our IFC-CEGAR implementation, 
does not handle the bitvector theory, limiting the set of programs that can be 
verified using this prototype. Extending the prototype to support this theory 
will be the subject of future work. 


Irc-BMC Implementation. In the IrCc-BMC implementation, the loop unroller, 
taint analysis, and lazy self-composition are implemented as passes that work on 
CHC, to generate SMT queries that are passed to the backend Z3 solver. Since 
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the IFc-BMC implementation uses Z3, and not SPACER, it can handle all the 
programs in our evaluation, unlike the Irc-CEGAR implementation. 


Input Format. The input to our tools is a C-program with annotations indicating 
which variables are secret and the locations at which leaks should be checked. 
In addition, variables can be marked as untainted at specific locations. 


6.2 Evaluation Benchmarks 


For experiments we used a machine running Intel Core i7-4578U with 8GB of 
RAM. We tested our prototypes on several micro-benchmarks? in addition to 
benchmarks inspired by real-world programs. For comparison against eager self- 
composition, we used the SEAHORN backend solvers on a 2-copy version of the 
benchmark. fibonacci is a micro-benchmark that computes the N-th Fibonacci 
number. There are no secrets in the micro-benchmark, and this is a sanity check 
taken from [33]. list_4/8/16 are programs working with linked lists, the trailing 
number indicates the maximum number of nodes being used. Some linked list 
nodes contain secrets while others have public data, and the verification problem 
is to ensure that a particular function that operates on the linked list does not 
leak the secret data. modadd_safe is program that performs multi-word addition; 
modexp_safe/unsafe are variants of a program performing modular exponen- 
tiation; and pwdcheck_safe/unsafe are variants of program that compares an 
input string with a secret password. The verification problem in these examples 
is to ensure that an iterator in a loop does not leak secret information, which 
could allow a timing attack. Among these benchmarks, the list_4/8/16 use 
structs while modexp_safe/unsafe involve bitvector operations, both of which 
are not supported by SPACER, and thus not by our Irc-CEGAR prototype. 


6.3 IFC-CEGAR Results 


Table 1 shows the Irc-CEGAR results on benchmark examples with varying 
parameter values. The columns show the time taken by eager self-composition 
(Eager SC) and Irc-CEGAR, and the number of refinements in IrFc-CEGAR. 
“TO” denotes a timeout of 300s. 

We note that all examples are secure and do not leak information. Since 
our path-sensitive symbolic taint analysis is more precise than a type-based 
taint analysis, there are few counterexamples and refinements. In particular, 
for our first example pwdcheck_safe, self-composition is not required as our 
path-sensitive taint analysis is able to prove that no taint propagates to the 
variables of interest. It is important to note that type-based taint analysis cannot 
prove that this example is secure. For our second example, pwdcheck2_safe, our 
path-sensitive taint analysis is not enough. Namely, it finds a counterexample, 
due to an implicit flow where a for-loop is conditioned on a tainted value, but 
there is no real leak because the loop executes a constant number of times. 


? http: //www.cs.princeton.edu/~aartig /benchmarks/ifc_bench.zip. 
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Table 1. Irc-CEGAR results (time in seconds) 


Benchmark Parameter | Eager SC | Irc-CEGAR 
Time (s) | Time (s) | #Refinements 

pwdcheck_safe | 4 8.8 0.2 0 

8 TO 0.2 0 

16 TO 0.2 0 

32 TO 0.2 0 
pwdcheck2_safe| N > 8 TO 61 1 
modadd_safe 2048b 180 0.2 0 

4096b TO 0.3 0 


Our refinement-based approach can easily handle this case, where IFC-CEGAR 
uses self-composition to find that the counterexample is spurious. It then refines 
the taint analysis model, and after one refinement step, it is able to prove that 
pwdcheck2_safe is secure. While these examples are fairly small, they clearly 
show that Irc-CEGAR is superior to eager self-composition. 


6.4 IFrc-BMC Results 


The experimental results for IFC-BMC are shown in Table 2, where we use some 
unsafe versions of benchmark examples as well. Results are shown for total time 
taken by eager self-composition (Eager SC) and the IrFc-BMC algorithm. (As 
before, “TO” denotes a timeout of 300s.) Irc-BMC is able to produce an answer 
significantly faster than eager self-composition for all examples. The last two 
columns show the time spent in taint checks in IFC-BMC, and the number of 
taint checks performed. 


Table 2. Irc-BMC results (time in seconds) 


Benchmark Result Eager SC | Irc-BMC | Taint checks | #Taint checks 
Time (s) | Time (s) | Time (s) 
fibonacci SAFE 0.55 0.1 0.07 85 
list_4 SAFE 2.9 0.15 0.007 72 
list_8 SAFE 3.1 0.6 0.02 144 
list_16 SAFE 3.2 1.83 0.08 288 
modexp-_safe SAFE TO 0.05 0.01 342 
modexp_unsafe |UNSAFE TO 1.63 1.5 364 
pwdcheck_safe |SAFE TO 0.05 0.01 1222 
pwdcheck_unsafe | UNSAFE TO 1.63 1.5 809 
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To study the scalability of our prototype, we tested IFC-BMC on the modular 
exponentiation program with different values for the maximum size of the integer 
array in the program. These results are shown in Table 3. Although the Irc-BMC 
runtime grows exponentially, it is reasonably fast — less than 2 min for an array 
of size 64. 


7 Related Work 


A rich body of literature has studied the verification of secure information flow 
in programs. Initial work dates back to Denning and Denning [16], who intro- 
duced a program analysis to ensure that confidential data does not flow to 
non-confidential outputs. This notion of confidentiality relates closely to: (i) 
non-interference introduced by Goguen and Meseguer [17], and (ii) separability 
introduced by Rushby [27]. Each of these study a notion of secure information 
flow where confidential data is strictly not allowed to flow to any non-confidential 
output. These definitions are often too restrictive for practical programs, where 
secret data might sometimes be allowed to flow to some non-secret output (e.g., 
if the data is encrypted before output), i.e. they require declassification [29]. Our 
approach allows easy and fine-grained de-classification. 

A large body of work has also studied the use of type systems that ensure 
secure information flow. Due to a lack of space, we review a few exemplars and 
refer the reader to Sabelfeld and Myers [28] for a detailed survey. Early work in 
this area dates back to Volpano et al. [35] who introduced a type system that 
maintains secure information based on the work of Denning and Denning [16]. 
Myers introduced the JFlow programming language (later known as Jif: Java 
information flow) [25] which extended Java with security types. Jif has been 
used to build clean slate, secure implementations of complex end-to-end sys- 
tems, e.g. the Civitas [10] electronic voting system. More recently, Patrigiani et 
al. [26] introduced the Java Jr. language which extends Java with a security type 
system, automatically partitions the program into secure and non-secure parts 
and executes the secure parts inside so-called protected module architectures. In 


Table 3. Irc-BMC results on modexp (time in seconds) 


Benchmark | Parameter | Time (s) | #Taint checks 
modexp 8 0.19 180 

16 1.6 364 

24 3.11 548 

32 8.35 732 

40 11.5 916 

48 21.6 1123 

56 35.6 1284 

64 85.44 1468 
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contrast to these approaches, our work can be applied to existing security-critical 
code in languages like C with the addition of only a few annotations. 

A different approach to verifying secure information flow is the use of dynamic 
taint analysis (DTA) [8, 12,13,21,30,31] which instruments a program with taint 
variables and taint tracking code. Advantages of DTA are that it is scalable to 
very large applications [21], can be accelerated using hardware support [13], 
and tracks information flow across processes, applications and even over the 
network [12]. However, taint analysis necessarily involves imprecision and in 
practice leads to both false positives and false negatives. False positives arise 
because taint analysis is an overapproximation. Somewhat surprisingly, false 
negatives are also introduced because tracking implicit flows using taint analysis 
leads to a deluge of false-positives [30], thus causing practical taint tracking 
systems to ignore implicit flows. Our approach does not have this imprecision. 

Our formulation of secure information flow is based on the self-composition 
construction proposed by Barthe et al. [5]. A specific type of self-composition 
called product programs was considered by Barthe et al. [4], which does not allow 
control flow divergence between the two programs. In general this might miss 
certain bugs as it ignores implicit flows. However, it is useful in verifying crypto- 
graphic code which typically has very structured control flow. Almeida et al. [1] 
used the product construction to verify that certain functions in cryptographic 
libraries execute in constant-time. 

Terauchi and Aiken [33] generalized self-composition to consider k-safety, 
which uses k — 1 compositions of a program with itself. Note that self- 
composition is a 2-safety property. An automated verifier for k-safety properties 
of Java programs based on Cartesian Hoare Logic was proposed by Sousa and 
Dillig [32]. A generalization of Cartesian Hoare Logic, called Quantitative Carte- 
sian Hoare Logic was introduced by Chen et al. [8]; the latter can also be used to 
reason about the execution time of cryptographic implementations. Among these 
efforts, our work is mostly closely related to that of Terauchi and Aiken [33], who 
used a type-based analysis as a preprocessing step to self-composition. We use a 
similar idea, but our taint analysis is more precise due to being path-sensitive, 
and it is used within an iterative CEGAR loop. Our path-sensitive taint analysis 
leads to fewer counterexamples and thereby cheaper self-composition, and our 
refinement approach can easily handle examples with benign branches. In con- 
trast to the other efforts, our work uses lazy instead of eager self-composition, 
and is thus more scalable, as demonstrated in our evaluation. A recent work [2] 
also employs trace-based refinement in security verification, but it does not use 
self-composition. 

Our approach has some similarities to other problems related to tainting [19]. 
In particular, Change-Impact Analysis is the problem of determining what parts 
of a program are affected due to a change. Intuitively, it can be seen as a form 
of taint analysis, where the change is treated as taint. To solve this, Gyori et 
al. [19] propose a combination of an imprecise type-based approach with a pre- 
cise semantics-preserving approach. The latter considers the program before 
and after the change and finds relational equivalences between the two ver- 
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sions. These are then used to strengthen the type-based approach. While our 
work has some similarities, there are crucial differences as well. First, our taint 
analysis is not type-based, but is path-sensitive and preserves the correctness 
of the defined abstraction. Second, our lazy self-composition is a form of an 
abstraction-refinement framework, and allows a tighter integration between the 
imprecise (taint) and precise (self-composition) models. 


8 Conclusions and Future Work 


A well-known approach for verifying secure information flow is based on the 
notion of self-composition. In this paper, we have introduced a new approach 
for this verification problem based on lazy self-composition. Instead of eagerly 
duplicating the program, lazy self-composition uses a synergistic combination 
of symbolic taint analysis (on a single copy program) and self-composition by 
duplicating relevant parts of the program, depending on the result of the taint 
analysis. We presented two instances of lazy self-composition: the first uses taint 
analysis and self-composition in a CEGAR loop; the second uses bounded model 
checking to dynamically query taint checks and self-composition based on the 
results of these dynamic checks. Our algorithms have been implemented in the 
SEAHORN verification platform and results show that lazy self-composition is 
able to verify many instances not verified by eager self-composition. 

In future work, we are interested in extending lazy self-composition to sup- 
port learning of quantified relational invariants. These invariants are often 
required when reasoning about information flow in shared data structures of 
unbounded size (e.g., unbounded arrays, linked lists) that contain both high- 
and low-security data. We are also interested in generalizing lazy self-composition 
beyond information-flow to handle other k-safety properties like injectivity, asso- 
ciativity and monotonicity. 
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Abstract. Power side-channel attacks, capable of deducing secret using statisti- 
cal analysis techniques, have become a serious threat to devices in cyber-physical 
systems and the Internet of things. Random masking is a widely used counter- 
measure for removing the statistical dependence between secret data and side- 
channel leaks. Although there are techniques for verifying whether software code 
has been perfectly masked, they are limited in accuracy and scalability. To bridge 
this gap, we propose a refinement-based method for verifying masking counter- 
measures. Our method is more accurate than prior syntactic type inference based 
approaches and more scalable than prior model-counting based approaches using 
SAT or SMT solvers. Indeed, it can be viewed as a gradual refinement of a set 
of semantic type inference rules for reasoning about distribution types. These 
rules are kept abstract initially to allow fast deduction, and then made concrete 
when the abstract version is not able to resolve the verification problem. We have 
implemented our method in a tool and evaluated it on cryptographic benchmarks 
including AES and MAC-Keccak. The results show that our method significantly 
outperforms state-of-the-art techniques in terms of both accuracy and scalability. 


1 Introduction 


Cryptographic algorithms are widely used in embedded computing devices, including 
SmartCards, to form the backbone of their security mechanisms. In general, security is 
established by assuming that the adversary has access to the input and output, but not 
internals, of the implementation. Unfortunately, in practice, attackers may recover cryp- 
tographic keys by analyzing physical information leaked through side channels. These 
so-called side-channel attacks exploit the statistical dependence between secret data 
and non-functional properties of a computing device such as the execution time [38], 
power consumption [39] and electromagnetic radiation [49]. Among them, differential 
power analysis (DPA) is an extremely popular and effective class of attacks [30,42]. 
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Fig. 1. Overview of SCINrer, where “ICR” denotes the intermediate computation result. 


To thwart DPA attacks, masking has been proposed to break the statistical depen- 
dence between secret data and side-channel leaks through randomization. Although 
various masked implementations have been proposed, e.g., for AES or its non-linear 
components (S-boxes) [15,37,51,52], checking if they are correct is always tedious 
and error-prone. Indeed, there are published implementations [51,52] later shown to be 
incorrect [21,22]. Therefore, formally verifying these countermeasures is important. 

Previously, there are two types of verification methods for masking countermea- 
sures [54]: one is type inference based [10,44] and the other is model counting 
based [26,27]. Type inference based methods [10,44] are fast and sound, meaning they 
can quickly prove the computation is leakage free, e.g., if the result is syntactically inde- 
pendent of the secret data or has been masked by random variables not used elsewhere. 
However, syntactic type inference is not complete in that it may report false positives. 
In contrast, model counting based methods [26,27] are sound and complete: they check 
if the computation is statistically independent of the secret [15]. However, due to the 
inherent complexity of model counting, they can be extremely slow in practice. 

The aforementioned gap, in terms of both accuracy and scalability, has not been 
bridged by more recent approaches [6, 13,47]. For example, Barthe et al. [6] proposed 
some inference rules to prove masking countermeasures based on the observation that 
certain operators (e.g., XOR) are invertible: in the absence of such operators, purely 
algebraic laws can be used to normalize expressions of computation results to apply the 
rules of invertible functions. This normalization is applied to each expression once, as it 
is costly. Ouahma et al. [47] introduced a linear-time algorithm based on finer-grained 
syntactical inference rules. A similar idea was explored by Bisi et al. [13] for analyzing 
higher-order masking: like in [6,47], however, the method is not complete, and does not 
consider non-linear operators which are common in cryptographic software. 


Our Contribution. We propose a refinement based approach, named SCInrer, to bridge 
the gap between prior techniques which are either fast but inaccurate or accurate but 
slow. Figure | depicts the overall flow, where the input consists of the program and a 
set of variables marked as public, private, or random. We first transform the program 
to an intermediate representation: the data dependency graph (DDG). Then, we tra- 
verse the DDG in a topological order to infer a distribution type for each intermediate 
computation result. Next, we check if all intermediate computation results are perfectly 
masked according to their types. If any of them cannot be resolved in this way, we 
invoke an SMT solver based refinement procedure, which leverages either satisfiabil- 
ity (SAT) solving or model counting (SAT#) to prove leakage freedom. In both cases, 
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the result is fed back to improve the type system. Finally, based on the refined type 
inference rules, we continue to analyze other intermediate computation results. 

Thus, SCINFER can be viewed as a synergistic integration of a semantic rule based 
approach for inferring distribution types and an SMT solver based approach for refining 
these inference rules. Our type inference rules (Sect. 3) are inspired by Barthe et al. [6] 
and Ouahma et al. [47] in that they are designed to infer distribution types of interme- 
diate computation results. However, there is a crucial difference: their inference rules 
are syntactic with fixed accuracy, i.e., relying solely on structural information of the 
program, whereas ours are semantic and the accuracy can be gradually improved with 
the aid of our SMT solver based analysis. At a high level, our semantic type inference 
rules subsume their syntactic type inference rules. 

The main advantage of using type inference is the ability to quickly obtain sound 
proofs: when there is no leak in the computation, often times, the type system can pro- 
duce a proof quickly; furthermore, the result is always conclusive. However, if type 
inference fails to produce a proof, the verification problem remains unresolved. Thus, 
to be complete, we propose to leverage SMT solvers to resolve these left-over verifica- 
tion problems. Here, solvers are used to check either the satisfiability (SAT) of a logical 
formula or counting its satisfying solutions (SAT#), the later of which, although expen- 
sive, is powerful enough to completely decide if the computation is perfectly masked. 
Finally, by feeding solver results back to the type inference system, we can gradually 
improve its accuracy. Thus, overall, the method is both sound and complete. 

We have implemented our method in a software tool named SCInrer and evaluated 
it on publicly available benchmarks [26,27], which implement various cryptographic 
algorithms such as AES and MAC-Keccak. Our experiments show SCInrer is both 
effective in obtaining proofs quickly and scalable for handling realistic applications. 
Specifically, it can resolve most of the verification subproblems using type inference 
and, as a result, satisfiability (SAT) based analysis needs to be applied to few left-over 
cases. Only in rare cases, the most heavyweight analysis (SAT#) needs to be invoked. 

To sum up, the main contributions of this work are as follows: 


— We propose a new semantic type inference approach for verifying masking counter- 
measures. It is sound and efficient for obtaining proofs. 

— We propose a method for gradually refining the type inference system using SMT 
solver based analysis, to ensure the overall method is complete. 

— We implement the proposed techniques in a tool named SCInrer and demonstrate 
its efficiency and effectiveness on cryptographic benchmarks. 


The remainder of this paper is organized as follows. After reviewing the basics in 
Sect.2, we present our semantic type inference system in Sect.3 and our refinement 
method in Sect. 4. Then, we present our experimental results in Sect. 5 and comparison 
with related work in Sect. 6. We give our conclusions in Sect. 7. 


2 Preliminaries 


In this section, we define the type of programs considered in this work and then review 
the basics of side-channel attacks and masking countermeasures. 
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2.1 Probabilistic Boolean Programs 


Following the notation used in [15,26,27], we assume that the program P implements 
a cryptographic function, e.g., c <— P(p, k) where p is the plaintext, k is the secret key 
and c is the ciphertext. Inside P, random variable r may be used to mask the secret 
key while maintaining the input-output behavior of P. Therefore, P may be viewed 
as a probabilistic program. Since loops, function calls, and branches may be removed 
via automated rewriting [26,27] and integer variables may be converted to bits, for 
verification purposes, we assume that P is a straight-line probabilistic Boolean program, 
where each instruction has a unique label and at most two operands. 

Let k (resp. r) be the set 
of secret (resp. random) bits, 
p the public bits, and c the 
variables storing intermediate 
results. Thus, the set of vari- 
ables is V = kKUrUDpUe. c4 = c3 © C9} 
In addition, the program uses 10 emer es : 
a set op of operators including 13 } penne: 
negation (~), and (A), or (V), 
and exclusive-or (@). A compu- 
tation of P is a sequence cy <— Fig. 2. An example for masking countermeasure. 
ii(p, k, r); Cn — inp, k,r) 
where, for each 1 < i < n, the value of i; is expressed in terms of p, k and r. Each 
random bit in r is uniformly distributed in {0, 1}; the sole purpose of using them in P is 
to ensure that c1, + + C are statistically independent of the secret k. 


bool compute(bool rı ,bool r2, 
bool rz ,bool k) 


aa 


bool cj, c2, C3, C4, C5, C63 
c1=k@ra; 
c2 = r1 Ora; 
c3 = c2 Ọ c1; 


OMDNAAARWNHH 


Data Dependency Graph (DDG). Our internal representation of P is a graph Gp = 
(N, E, A), where N is the set of nodes, E is the set of edges, and 4 is a labeling function. 


— N = LW& Ly, where L is the set of instructions in P and Ly is the set of terminal 
nodes: l, € Ly corresponds to a variable or constant v € k Ur U p U {0, 1}. 

— E C N x N contains edge (l, l’) if and only if l : c = xo y, where either x or y is 
defined by l’; or / : c = ~x, where x is defined by J’; 

— A maps each l € N to a pair (val, op): AZ) = (c,°) for l: c = x o y; A(D = (ce, 7) for 
l: c = ~x; and A(J) = (v, L) for each terminal node /,. 


We may use 4 (1) = c and 42(1) = o to denote the first and second elements of the pair 
A(D) = (c, °), respectively. We may also use /.1£t to denote the left child of /, and /.rgt 
to denote the right child if it exists. A subtree rooted at node / corresponds to an inter- 
mediate computation result. When the context is clear, we may use the following terms 
exchangeably: a node /, the subtree T rooted at /, and the intermediate computation 
result c = A, (J). Let |P| denote the total number of nodes in the DDG. 

Figure 2 shows an example where k = {k}, r = {r1, 12,13}, € = {C1, C2, C3, Ca, C5, C6} 
and p = 0. On the left is a program written in a C-like language except that © denotes 
XOR and A denotes AND. On the right is the DDG, where 
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C3 = Ocy =r Or) O(KOM)=ker, 

C4 = C3 OC2 = (11 82) P@(KK OM) O(r] @2) =K Or 

C5 = cg Ory = (71 ®l2) O(K GM) O(r7] @2)) Or, =K Or, Orn 

C6 = C5 Ar3 = (1 Sr2)S (kS r))® (ri EN)N)ON)AR =(KOr ON)AT3 


Let supp : N —> k Ur U p be a function mapping each node / to its support variables. 
That is, supp(/) = 0 if 2,() € {0,1}; supp() = {x} if 4 (D = x € kK UrU p; and 
supp(/) = supp(/.1£t) U supp(/.rgt) otherwise. Thus, the function returns a set of 
variables that 2; (/) depends upon structurally. 

Given a node / whose corresponding expression e is defined in terms of variables 
in V, we say that e is semantically dependent on a variable r € V if and only if there 
exist two assignments, 7; and 7, such that 7)(r) + m2(r) and m(x) = m(x) for every 
x € V \ {r}, and the values of e differ under mı and 7. 

Let semd : N > r be a function such that semd(l) denotes the set of random vari- 
ables upon which the expression e of / semantically depends. Thus, semd(/) © supp(/); 
and for each r € supp(/) \ semd(/), we know ,(J) is semantically independent of 
r. More importantly, there is often a gap between supp(/) N r and semd(/), namely 
semd(/) C supp(/) N r, which is why our gradual refinement of semantic type inference 
rules can outperform methods based solely on syntactic type inference. 

Consider the node l., in Fig. 2: we have supp(/.,) = {r1, r2, k}, semd(/,,) = {r2}, and 
supp(/.,) Ar = {r1, r2}. Furthermore, if the random bits are uniformly distributed in 
{0, 1}, then c4 is both uniformly distributed and secret independent (Sect. 2.2). 


2.2 Side-Channel Attacks and Masking 


We assume the adversary has access to the public input p and output c, but not the 
secret k and random variable r, of the program c < P(p, k). However, the adversary 
may have access to side-channel leaks that reveal the joint distribution of at most d 
intermediate computation results c),---cg (e.g., via differential power analysis [39]). 
Under these assumptions, the goal of the adversary is to deduce information of k. To 
model the leakage of each instruction, we consider a widely-used, value-based model, 
called the Hamming Weight (HW) model; other power leakage models such as the 
transition-based model [5] can be used similarly [6]. 

Let [n] denote the set {1,--- , n} of natural numbers where n > 1. We call a set with 
d elements a d-set. Given values (p, k) for (p, k) and a d-set {c,,--+ , cq} of intermediate 
computation results, we use D,;(c), +++ Ca) to denote their joint distribution induced by 
instantiating p and k with p and k, respectively. Formally, for each vector of values 
V1,°°* ,Vq¢ in the probability space {0, 1}¢, we have Dp kli, Ca(V1,°+* Va) = 


Kr € (0, [vi = i(p = pk = kr =r), ++- va = ia(p = p,k = k,r =n} 


ri 
Definition 1. We say a d-set {c,,--- , cq} of intermediate computation results is 
— uniformly distributed if D, (c1, +- , C4) is a uniform distribution for any p and k. 


— secret independent if Dpx(c1,--- , ca) = Dp (c1,* ++ , Ca) for any (p, k) and (p, k’). 
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Note that there is a difference between them: an uniformly distributed d-set is always 
secret independent, but a secret independent d-set is not always uniformly distributed. 


Definition 2. A program P is order-d perfectly masked if every k-set {c1,--+ ,cx} of P 
such that k < d is secret independent. When P is (order-1) perfectly masked, we may 
simply say it is perfectly masked. 


To decide if P is order-d perfectly masked, it suffices to check if there exist a d-set and 
two pairs (p, k) and (p, k’) such that Dpz(c1,-++ , ca) # Dye (C1, +*+ , Ca). In this context, 
the main challenge is computing Dp (c1, :** , Ca) which is essentially a model-counting 
(SAT#) problem. In the remainder of this paper, we focus on developing an efficient 
method for verifying (order-1) perfect masking, although our method can be extended 
to higher-order masking as well. 


Gap in Current State of Knowledge. Existing methods for verifying masking coun- 
termeasures are either fast but inaccurate, e.g., when they rely solely on syntactic type 
inference (structural information provided by supp in Sect. 2.1) or accurate but slow, 
e.g., when they rely solely on model-counting. In contrast, our method gradually refines 
a set of semantic type-inference rules (i.e., using semd instead of supp as defined in 
Sect. 2.1) where constraint solvers (SAT and SAT#) are used on demand to resolve 
ambiguity and improve the accuracy of type inference. As a result, it can achieve the 
best of both worlds. 


3 The Semantic Type Inference System 


We first introduce our distribution types, which are inspired by prior work in [6, 13,47], 
together with some auxiliary data structures; then, we present our inference rules. 


3.1 The Type System 


Let T = {CST, RUD, SID, NPM, UKD} be the set of distribution types for intermediate com- 
putation results, where [c] denotes the type of c < i(p, k, r). Specifically, 


— [c] = CST means c is a constant, which implies that it is side-channel leak-free; 
— [[c] = RUD means c is randomized to uniform distribution, and hence leak-free; 
— [c] = SID means c is secret independent, i.e., perfectly masked; 

— [c] = NPM means c is not perfectly masked and thus has leaks; and 

— [[c]] = UKD means c has an unknown distribution. 


Definition 3. Let unq : N — rand dom: N —> r be two functions such that (i) 
for each terminal node l € Ly, if A\() € r, then unq(/) = dom(l) = A,(); otherwise 
unq(/) = dom(/) = supp(/) = 9; and (ii) for each internal node l € L, we have 


— unq(/) = (unq(l.1 ft) U unq(.rgt)) \ (supp(l.1f£t) A supp(l.rgt)); 
— dom(l) = (dom(l.1 ft) U dom(l.rgt)) O unq()) if A2x() = ®; but dom(/) = Ø otherwise. 
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À A (l k ) 
ane, A) er Tea 1) € pu LEAR; AD € {0, 1} 
[/] = RUD [Z] = UKD [i] = CST 
A= [lft] = RUD AQ) =e [rgt] = RUD 
Xor-Rup, dom(/.1£t) \ semd(/.rgt) + 0 Xor-Rup, dom(/.rgt) \ semd(/.1£t) #0 
[7] = RUD [Z] = RUD 
ACD) € {A, V} [rgt] ¢ {UKD, NPM} AyD) € {A, V} [.1£t] ¢ {UKD, NPM} 
AO-Rup, [lft] = RUD semd(/.1£t)M semd(l.rgt) = 0 AO Rips [rgt] =RUD semd(/.rgt) N semd(/.1£t) = 0 
[0 = SID [i] = SID 
AO-Rup; Ax) €{A,V} [lft] = [rgt] = RUD (dom(J.1£t) \ semd(/.rgt)) U (dom(/.rgt) \ semd(/.1£t)) + 0 
[H] = SID 
Sw A(D € {9, A, V} [Lrgt] = [lft] =SID  semd(/.1£t) M semd(I.rgt) = 0 
[] = SID 
AD =~ No-Kry supp) Nk =0 Uxp no-rule is applicable at / 
[E = Elft] [J = SID [Z] = UKD 


Fig. 3. Our semantic type-inference rules. The NPM type is not yet used here; its inference rules 
will be added in Fig. 4 since they rely on the SMT solver based analyses. 


Both unq(/) and dom(/) are computable in time that is linear in |P| [47]. Following the 
proofs in [6,47], it is easy to reach this observation: Given an intermediate computation 
result c — i(p, k, r) labeled by J, the following statements hold: 


if |dom(/)| + @, then [c] = RUD; 

if [c]] = RUD, then [~c] = RUD; if [c] = SID, then [>c] = SID. 

if r ¢ semd(/) for a random bit r € r, then [r ® c] = RUD; 

for every c’ + i'(p,k,r) labeled by I’, if semd(/) O semd(/’) = Ø and [c] = [c’] = 
SID, then [[c o c’] = SID. 


i eal 


Figure 3 shows our type inference rules that concretize these observations. When mul- 
tiple rules could be applied to a node / € N, we always choose the rules that can lead 
to [/] = RUD. If no rule is applicable at l, we set [Z] = UKD. When the context is clear, 
we may use [[/]] and [c] exchangeably for 2; (J) = c. The correctness of these inference 
rules is obvious by definition. 


Theorem 1. For every intermediate computation result c — i(p, k, r) labeled by l, 


— if [c] = RUD, then c is uniformly distributed, and hence perfectly masked; 
— if [c] = SID, then c is guaranteed to be perfectly masked. 


To improve efficiency, our inference rules may be applied twice, first using the supp 
function, which extracts structural information from the program (cf. Sect. 2.1) and then 
using the semd function, which is slower to compute but also significantly more accu- 
rate. Since semd(/) © supp(/) for all / € N, this is always sound. Moreover, type infer- 
ence is invoked for the second time only if, after the first time, [[/]] remains UKD. 


Example 1. When using type inference with supp on the running example, we have 
[rid = [red = [z] = lc] = Leo = [cs] = RUD, [A] = Leal] = Les] = [cs] = UKD 


When using type inference with semd (for the second time), we have 


[rid = [2] = [r] = [er] = [c2] = Les] = [c4] = [cs] = RUD, [k] = UKD, [co] = SID 
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3.2 Checking Semantic Independence 


Unlike supp(/), which only extracts structural information from the program and hence 
may be computed syntactically, semd(/) is more expensive to compute. In this subsec- 
tion, we present a method that leverages the SMT solver to check, for any intermediate 
computation result c — i(p, k,r) and any random bit r € r, whether c is semantically 
dependent of r. Specifically, we formulate it as a satisfiability (SAT) problem (formula 
@®,) defined as follows: 


OO (co, p, k, r \ {r} A OF (Cci, p, ker \ {r} A OF(Co, C1), 


where (an (resp. oe!) encodes the relation i(p, k, r) with r replaced by 0 (resp. 1), co 
and c; are copies of c and 7 asserts that the outputs differ even under the same inputs. 

In logic synthesis and optimization, when r ¢ semd(/), r will be called the don’t 
care variable [36]. Therefore, it is easy to see why the following theorem holds. 


Theorem 2. ®, is unsatisfiable iff the value of r does not affect the value of c, i.e., c is 
semantically independent of r. Moreover, the formula size of ®; is linear in |P]. 


Cp-Rup [c1, -> ck] =RUD [ck] = RUD = semd(c),--- ,cx) O semd(ck+1) = 0 

C1,°++ 5 Ck+1 | = RUD 
Cp-Sip [cise ce, [ck+1 ] € (SID, RUD} Mekal #Mci,--- ce] semd(ci,--- c) N semd(cy41) = O 
: cn cm] = SID 
Cp-Sip [c1, +++ ck] =RUD [ce] = RUD (dom(c),--- ,cx) \ semd(cy+1)) N (dom(cz+1) \ semd(c1, +- ,cx)) #0 
2 C1, ++ ,Ck+1 ] = SID 
no-rule is appliable at {c1, = , 
Cp-Uxp s appli {C1 Chet} 

[c1 ,Ck+1]] = UKD 


Fig. 4. Our composition rules for handling sets of intermediate computation results. 


3.3 Verifying Higher-Order Masking 


The type system so far targets first-order masking. We now outline how it extends 
to verify higher-order masking. Generally speaking, we have to check, for any k-set 
{c1,°°+ , Cx} of intermediate computation results such that k < d, the joint distribution is 
either randomized to uniform distribution (RUD) or secret independent (SID). 

To tackle this problem, we lift supp, semd, unq, and dom to sets of computation 


results as follows: for each k-set {c,,--- , cx}, 

— supp(c),:+- , Ck) = Use SuPP(Ci); 

— semd(c),-++ , Ck) = Uie Semd(c;); 

= unqg(ci, +, ck) = (Uiety una(ci)) \ U; jer (SuPP(C;) N supp(c;)); and 
— dom(c1, ++- , ck) = ( Uie dom(c;)) N ung(ci, ++ , Cx). 


Our inference rules are extended by adding the composition rules shown in Fig. 4. 
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Theorem 3. For every k-set {c,,--+ , cx} of intermediate computations results, 

— if [c1,* +: ,cx]] = RUD, then {c1,--- , cx} is guaranteed to be uniformly distributed, 
and hence perfectly masked; 

— if [c1,: +} , ck] = SID, then {c1,--+ , cx} is guaranteed to be perfectly masked. 


We remark that the semd function in these composition rules could also be safely 
replaced by the supp function, just as before. Furthermore, to more efficiently verify 
that program P is perfect masked against order-d attacks, we can incrementally apply 
the type inference for each k-set, where k = 1,2,...,d. 


4 The Gradual Refinement Approach 


In this section, we present our method for gradually refining the type inference system 
by leveraging SMT solver based techniques. Adding solvers to the sound type system 
makes it complete as well, thus allowing it to detect side-channel leaks whenever they 
exist, in addition to proving the absence of such leaks. 


4.1 SMT-Based Approach 


For a given computation c + i(p, k, r), the verification of perfect masking (Definition 2) 
can be reduced to the satisfiability of the logical formula (¥) defined as follows: 


Ap.Ak AK ( 5 mip, k, v) # Ds a i(p, k’,v,)). 


v,€(0,1 
Intuitively, given values (v,,¥;,) of (p, k), count = Xi, 40,1) ÌV p, Vk, ¥-) denotes the 
number of assignments of the random variable r under which i(v p, vg, r) is evaluated to 
logical 1. When random bits in r are uniformly distributed in the domain {0, 1}, oe is 
the probability of i(v,, vg, r) being logical 1 for the given pair (vp, v). Therefore, ¥ is 
unsatisfiable if and only if c is perfectly masked. 
Following Eldib et al. [26,27], we encode the formula ¥ as a quantifier-free first- 


order logic formula to be solved by an off-the-shelf SMT solver (e.g., Z3): 


Ir|-1 


2 gi 
Nao cr) A Na Ow) A Opi A Ox 


— O; (resp. O;,) for each r € {0,---, 2lrl-1}; encodes a copy of the input-output relation 
of i(p, k,r) (resp. i(p, k’,r)) by replacing r with concrete values r. There are 2!" 
distinct copies, but share the same plaintext p. 

— Opi: converts Boolean outputs of these copies to integers (true becomes 1 and false 
becomes 0) so that the number of assignments can be counted. 

— @,: asserts the two summations, for k and k’, differ. 


Example 2. In the running example, for instance, verifying whether node c4 is perfectly 
masked requires the SMT-based analysis. For brevity, we omit the detailed logical for- 
mula while pointing out that, by invoking the SMT solver six times, one can get the 
following result: [c1] = [c2] = [c3] = [c4] = [es] = [cs] = SID. 
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AD €{A,V} [Lrgt] =NPM  [.1ft] = RUD Ax) €{A,V} [LIft]=NPM_[lrgt] = RUD 
AO-Ne, semd(I.1£t) N semd(I.rgt) = 0 AO-Nem, semd(I.rgt) N semd(/.1ft) = 0 
[Z] = NPM [Z] = NPM 
AD E{A, V} [rgt] =NPM  [/.1ft]] = RUD AD E{A, Vv} [lft] =NPM = [/.rgt]] = RUD 
AO-Ne; dom(/.1£t) \ semd(/.rgt) + 0 AO-NeMy dom(/.rgt) \ semd(/.1ft) + 0 
[/] = NPM [Z] = NPM 
Cp-Npm Ucr] = NPM 
[ci, +> , C1] = NPM 


Fig. 5. Complementary rules used during refinement of the type inference (Fig. 3). 


Although the SMT formula size is linear in |P], the number of distinct copies is expo- 
nential of the number of random bits used in the computation. Thus, the approach cannot 
be applied to large programs. To overcome the problem, incremental algorithms [26,27] 
were proposed to reduce the formula size using partitioning and heuristic reduction. 


Incremental SMT-Based Algorithm. Given a computation c + i(p, k, r) that corre- 
sponds to a subtree T rooted at / in the DDG, we search for an internal node /, in T (a 
cut-point) such that dom(/;) N unq(/) + Ø. A cut-point is maximal if there is no other 
cut-point from / to /,. Let T be the simplified tree obtained from T by replacing every 
subtree rooted by a maximal cut-point with a random variable from dom(/,;) N unq(/). 
Then, T is SID iff T is SID. 

The main observation is that: if /, is a cut-point, there is a random variable r € 
dom(/;) N unq(/), which implies 2; (/;) is RUD. Here, r € unq(/) implies 2; (/,) can be 
seen as a fresh random variable when we evaluate /. Consider the node c3 in our running 
example: it is easy to see rı € dom(c2) Nunq(c3). Therefore, for the purpose of verifying 
c3, the entire subtree rooted at c3 can be replaced by the random variable r;. 

In addition to partitioning, heuristics rules [26,27] can be used to simplify SMT 
solving. (1) When constructing formula @ of c, all random variables in supp(/)\semd(J), 
which are don’t cares, can be replaced by constant 1 or 0. (2) The No-Key and Sm rules 
in Fig. 3 with the supp function are used to skip some checks by SMT. 


Example 3. When applying incremental SMT-based approach to our running example, 
cı has to be decided by SMT, but c2 is skipped due to No-Key rule. 

As for c3, since r; € dom(c2) N unq(c3), c2 is a cut-point and the subtree rooted at c2 
can be replaced by 7), leading to the simplified computation rı ® (r2 ® k) — subsequently 
it is skipped by the Sw rule with supp. Note that the above Sm rule is not applicable to 
the original subtree, because rz occurs in the support of both children of c3. 

There is no cut-point for c4, so it is checked using the SMT solver. But since c4 is 
semantically independent of rı (a don’t care variable), to reduce the SMT formula size, 
we replace rı by 1 (or 0) when constructing the formula ®. 
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4.2 Feeding SMT-Based Analysis Results Back to Type System 


Consider a scenario where initially the type system 
(cf. Sect. 3) failed to resolve a node J, i.e., [/]] = UKD, 
but the SMT-based approach resolved it as either NPM 
or SID. Such results should be fed back to improve 
the type system, which may lead to the following two 
favorable outcomes: (1) marking more nodes as per- 
fectly masked (RUD or SID) and (2) marking more 
nodes as leaky (NPM), which means we can avoid 
expensive SMT calls for these nodes. More specifi- 
cally, if SMT-based analysis shows that / is perfectly 
masked, the type of / can be refined to [/]] = SID; feeding it back to the type system 
allows us to infer more types for nodes that structurally depend on /. 

On the other hand, if SMT-based analysis shows / is not perfectly masked, the type 
of l can be refined to [/]] = NPM; feeding it back allows the type system to infer that 
other nodes may be NPM as well. To achieve what is outlined in the second case above, 
we add the NPM-related type inference rules shown in Fig.5. When they are added to 
the type system outlined in Fig. 3, more NPM type nodes will be deduced, which allows 
our method to skip the (more expensive) checking of NPM using SMT. 


Fig. 6. Example for feeding back. 


Example 4. Consider the example DDG in Fig. 6. By applying the original type infer- 
ence approach with either supp or semd, we have 


[ci] = [c4] = RUD, [e2] = [c3] = [c6] = SID, [cs] = [c7] = UKD. 


In contrast, by applying SMT-based analysis to cs, we can deduce [[cs]] = SID. Feeding 
[cs] = SID back to the original type system, and then applying the Sw rule to c7 = 
Cs ® c6, we are able to deduce [[c7]] = SID. Without refinement, this was not possible. 


4.3 The Overall Algorithm 


Having presented all the components, we now present the overall procedure, which 
integrates the semantic type system and SMT-based method for gradual refinement. 
Algorithm | shows the pseudo code. Given the program P, the sets of public (p), secret 
(k), random (r) variables and an empty map 7, it invokes SCINFER(P, p, k, r, 7) to tra- 
verse the DDG in a topological order and annotate every node / with a distribution 
type from T. The subroutine TypPeINFer implements the type inference rules outlined in 
Figs. 3 and 5, where the parameter f can be either supp or semd. 

SCInrer first deduces the type of each node / € N by invoking Tyrelnrer with 
f = supp. Once a node / is annotated as UKD, a simplified subtree P of the subtree 
rooted at / is constructed. Next, TypeINFER with f = semd is invoked to resolve the UKD 
node in P. If m(1) becomes non-UKD afterward, TypeINFer with f = supp is invoked 
again to quickly deduce the types of the fan-out nodes in P. But if a(/) remains UKD, 
SCInrer invokes the incremental SMT-based approach to decide whether / is either SID 
or NPM. This is sound and complete, unless the SMT solver runs out of time/memory, in 
which case UKD is assigned to /. 
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Algorithm 1. Function SCINFER(P, p, k, r, 7) 


1 Function SCInrer(P, p, k, r, 7) 

2 foreach / € N in a topological order do 

3 if l is a leaf then x(/) := [I]; 

4 else 

5 TypEINFER(I, P, p, k, r, n, Supp); 

6 if z(/) = UKD then 

7 let P be the simplified tree of the subtree rooted by / in P; 
8 TypEINFer(I/, P, p, k,r, n, semd); 

9 if z(/) = UKD then 
10 res:=CheckBySMT(P, p, k, r); 
1 if res=Not-Perfectly-Masked then z(/) := NPM; 
12 else if res=Perfectly-Masked then z(/) := SID; 
13 else (/) := UKD; 


Theorem 4. For every intermediate computation result c — i(p,k,r) labeled by l, our 
method in SCINFER guarantees to return sound and complete results: 


— m(1) = RUD iff c is uniformly distributed, and hence perfectly masked; 
— n(1) = SID iff c is secret independent, i.e., perfectly masked; 
— a(l) = NPM iff c is not perfectly masked (leaky); 


If timeout or memory out is used to bound the execution of the SMT solver, it is also 
possible that z(/) = UKD, meaning c has an unknown distribution (it may or may not be 
perfectly masked). It is interesting to note that, if we regard UKD as potential leak and at 
the same time. bound (or even disable) SMT-based analysis, Algorithm 1 degenerates 
to a sound type system that is both fast and potentially accurate. 


5 Experiments 


We have implemented our method in a verification tool named SCInrer, which uses 
Z3 [23] as the underlying SMT solver. We also implemented the syntactic type infer- 
ence approach [47] and the incremental SMT-based approach [26,27] in the same tool 
for experimental comparison purposes. We conducted experiments on publicly avail- 
able cryptographic software implementations, including fragments of AES and MAC- 
Keccak [26,27]. Our experiments were conducted on a machine with 64-bit Ubuntu 
12.04 LTS, Intel Xeon(R) CPU E5-2603 v4, and 32 GB RAM. 

Overall, results of our experiments show that (1) SCINFkr is significantly more accu- 
rate than prior syntactic type inference method [47]; indeed, it solved tens of thousand 
of UKD cases reported by the prior technique; (2) SCInrer is at least twice faster than 
prior SMT-based verification method [26,27] on the large programs while maintaining 
the same accuracy; for example, SCINFER verified the benchmark named P12 in a few 
seconds whereas the prior SMT-based method took more than an hour. 
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Algorithm 2. Procedure TypeInrer(I, P, p, k,r,7, f) 


1 Procedure Tyrelnrer(!, P, p, k,r,7, f) 

if 2,(J) = ~ then z(J) := n(l.lft) ; 

else if A2(/) = @ then 

if x(.1£t) = RUD A dom(/.1ft) \ f(l.rgt) + 0 then z(J) := RUD; 

else if z(/.rgt) = RUD A dom(l.rgt) \ f(l.1f£t) + 0 then z(/) := RUD; 

else if z(/.rgt) = a(/.1ft) = SIDA f(L.1£ft) N f(Lrgt) Nr = 0 then 
| nm(1) := SID 

else if supp(/) O k = 0 then z(/) := SID; 

else (/) := UKD; 


© © yAn & UN 


else 


= 
> 


((x(/.1£t) = RUD A x(l.rgt) ¢ {UKD, NPM})V 
u if | (a(l.rgt) = RUD A^ n(l.1ft) ¢ {UKD, NPMJ)) then z(/) := SID; 
AfLIFÐN f(rgt) Nr =0 


.. {(dom(I.rgt) \ f(.1£t)) U (dom(l.1£t) \ f(.rgt)) + 0 
n elseif | An(.1ft) = RUD A n(l.rgt) = RUD theri 
13 | z) := SID 

((x(/.1ft) = RUD A n(I.rgt) = NPM)V 
14 else if | (a(l.rgt) = RUD A n(I.1ft) = NPM)) then x(/) := NPM; 
Af(L.1£t)N f(Lrgt)nr=0 

.. ({ (a(L.1£t) = RUD A n(L.rgt) = NPM A dom(I.1ft) \ f(l.rgt) + 0)V 
“ elseif | (a(l.rgt) = RUD A n(l.1£t) = NPM A dom(l.rgt) \ f(L1£t) + 0) ) en 
16 | (J) := NPM 
17 else if (x(/.1£t) = a(l.rgt) = SID) A f(l.1ft) NA f(Lrgt) Nr = 0 then 
18 | (J) := SID 
19 else if supp(/) O k = 0 then z(/) := SID; 
20 else z(/) := UKD; 


5.1 Benchmarks 


Table | shows the detailed statistics of the benchmarks, including seventeen examples 
(P1—P17), all of which have nonlinear operations. Columns 1 and 2 show the name of 
the program and a short description. Column 3 shows the number of instructions in the 
probabilistic Boolean program. Column 4 shows the number of DDG nodes denoting 
intermediate computation results. The remaining columns show the number of bits in 
the secret, public, and random variables, respectively. Remark that the number of ran- 
dom variables in each computation is far less than the one of the program. All these 
programs are transformed into Boolean programs where each instruction has at most 
two operands. Since the statistics were collected from the transformed code, they may 
have minor differences from statistics reported in prior work [26, 27]. 

In particular, PI-P5 are masking examples originated from [10], P6—P7 are orig- 
inated from [15], P8—P9 are the MAC-Keccak computation reordered examples orig- 
inated from [11], P10-P11 are two experimental masking schemes for the Chi func- 
tion in MAC-Keccak. Among the larger programs, P12—P17 are the regenerations of 
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Table 1. Benchmark statistics. 


Name | Description HLoc |#Nodes ||k| |p| | |r‘ 
P1 |CHES13 Masked Key Whitening 79 |32 16 | 16 16 
P2 |CHES13 De-mask and then Mask 67 |38 8 0 16 
P3 |CHES13 AES Shift Rows 21 6 2| 0 2 
P4 |CHES13 Messerges Boolean to Arithmetic (bit0) 23 |6 2 0 2 
P5 |CHES13 Goubin Boolean to Arithmetic (bit0) 27 1 0 2 
P6 |Logic Design for AES S-Box (1st implementation) 32 |9 2 0 2 
P7 | Logic Design for AES S-Box (2nd implementation) 40 |11 2 0 3 
P8 | Masked Chi function MAC-Keccak (1st implementation) 59 18 3 0 4 
P9 | Masked Chi function MAC-Keccak (2nd implementation) 60 |18 3 0 4 
P10 | Syn. Masked Chi func MAC-Keccak (1st implementation) 66 |28 3 0 4 
P11 |Syn. Masked Chi func MAC-Keccak (2nd implementation) |66 |28 3 0 4 
P12 |MAC-Keccak 512b Perfect masked 426k | 197k |288 | 288 |3205 
P13 |MAC-Keccak 512b De-mask and then mask (compiler error) |426k 197k |288 | 288 |3205 
P14 |MAC-Keccak 512b Not-perfect Masking of Chi function (v1) | 426k | 197k |288 | 288 | 3205 
P15 |MAC-Keccak 512b Not-perfect Masking of Chi function (v2) | 429k 198k |288 | 288 | 3205 
P16 |MAC-Keccak 512b Not-perfect Masking of Chi function (v3) | 426k 197k |288 | 288 | 3205 
P17 |MAC-Keccak 512b Unmasking of Pi function 442k | 205k |288 | 288 |3205 


MAC-Keccak reference code submitted to the SHA-3 competition held by NIST, where 
P13—P16 implement the masking of Chi functions using different masking schemes and 
P17 implements the de-masking of Pi function. 


5.2 Experimental Results 


We compared the performance of SCInrer, the purely syntactic type inference method 
(denoted Syn. Infer) and the incremental SMT-based method (denoted by SMT App). 
Table 2 shows the results. Column 1 shows the name of each benchmark. Column 2 
shows whether it is perfectly masked (ground truth). Columns 3—4 show the results 
of the purely syntactic type inference method, including the number of nodes inferred 
as UKD type and the time in seconds. Columns 5-7 (resp. Columns 8—10) show the 
results of the incremental SMT-based method (resp. our method SCINrer), including 
the number of leaky nodes (NPM type), the number of nodes actually checked by SMT, 
and the time. 

Compared with syntactic type inference method, our approach is significantly more 
accurate (e.g., see P4, P5 and P15). Furthermore, the time taken by both methods are 
comparable on small programs. On the large programs that are not perfectly masked 
(i.e., P13—P17), our method is slower since SCINFER has to resolve the UKD nodes 
reported by syntactic inference by SMT. However, it is interesting to note that, on the 
perfectly masked large program (P12), our method is faster. 

Moreover, the UKD type nodes in P4, reported by the purely syntactic type inference 
method, are all proved to be perfectly masked by our semantic type inference system, 
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Table 2. Experimental results: comparison of three approaches. 


Name | Masked | Syn. Infer [47] SMT App [26,27] SCINFER 
UKD Time NPM By SMT | Time NPM By SMT | Time 

P1 No 16 | =0s 16 16 0.39s 16 16 |0.39s 

P2 No 8 |z0s 8 8 |0.28s 8 8 |0.57s 

P3 | Yes 0 | 0s 0 0 |s0s 0 0 |=0s 

P4 | Yes 3 | =0s 0 3 |0.16s 0 0 | 0.06s 

PS |Yes 3 |=0s 0 3 |0.15s 0 2 |0.25s 

P6 No 2 |=0s 2 2 |0.11s 2 2 |0.16s 

P7 No 2 |0.01s 1 2 [O.lls 1 1 |0.26s 

P8 No 3 | 20s 3 3 /0.15s 3 3 |0.29s 

P9 No 2 |=0s 2 2 |O.11s 2 2 |0.23s 
P10 No 3 |=0s 1 2 [0.15s 1 2 |0.34s 
P11 No 4 |x0s 1 3 |0.2s 1 3 |0.5s 
P12 | Yes 0 | 1min 5s 0 0 |92min8s 0 0 |3.8s 
P13 | No 4800 | 1min 11s} 4800 | 4800 |95min 30s 4800 4800 | 47min 8s 
P14 | No 3200 | 1 min 11s) 3200 | 3200 118 min 1s 3200 3200 | 53min 40s 
P15 No 3200 | 1min21s} 1600 | 3200 127 min 45 s 1600 3200 | 69 min 6s 
P16 | No 4800 |1min 13s} 4800 | 4800 123 min 54s} 4800 4800 | 61 min 15s 
P17 | No 17600 | 1 min 14s | 17600 | 16000 |336min51s| 17600 | 12800 | 121 min 28s 


without calling the SMT solver at all. As for the three UKD type nodes in P5, our method 
proves them all by invoking the SMT solver only twice; it means that the feedback of 
the new SID types (discovered by SMT) allows our type system to improve its accuracy, 
which turns the third UKD node to SID. 

Finally, compared with the original SMT-based approach, our method is at least 
twice faster on the large programs (e.g., P12—P17). Furthermore, the number of nodes 
actually checked by invoking the SMT solver is also lower than in the original SMT- 
based approach (e.g., P4—P5, and P17). In particular, there are 3200 UKD type nodes in 
P17, which are refined into NPM type by our new inference rules (cf. Fig. 5), and thus 
avoid the more expensive SMT calls. 

To sum up, results of our experiments show that: SCInrer is fast in obtaining proofs 
in perfectly-masked programs, while retaining the ability to detect real leaks in not- 
perfectly-masked programs, and is scalable for handling realistic applications. 


5.3 Detailed Statistics 


Table 3 shows the more detailed statistics of our approach. Specifically, Columns 2-5 
show the number of nodes in each distribution type deduced by our method. Column 
6 shows the number of nodes actually checked by SMT, together with the time shown 
in Column 9. Column 7 shows the time spent on computing the semd function, which 
solves the SAT problem. Column 8 shows the time spent on computing the don’t care 
variables. The last column shows the total time taken by SCINFEr. 


172 J. Zhang et al. 


Table 3. Detailed statistics of our new method. 


Name | SCINFER 
Nodes Time 
RUD SID | CST | NPM SMT | semd Don’t care SMT Total 

P1 16 0/0 16 16 | x0s =0s 0.39s 0.39s 

P2 16 00 8 8 | 0.27s 0.14s 0.16s 0.57s 

P3 6 00 0 0 | 20s =0s =0s =0s 

P4 6 0/0 0 0 | 0s =0s =0s 0.06 s 

PS 6 2 J0 0 2 | 0.08s 0.05 s 0.05 s 0.25s 

P6 4 3 |0 2 2 | 0.05s 0.07s 0.04 s 0.16s 

P7 5 50 1 1 | 0.14s 0.09 s 0.03s 0.26s 

P8 11 4 |0 3 3 |0.14s 0.09 s 0.06 s 0.29s 

P9 12 4 |0 2 2 | 0.13s 0.07s 0.03 s 0.23s 
P10 20 6 j1 1 2 | 0.15s 0.14s 0.05s 0.34s 
P11 19 7/1 1 3 |0.23s 0.28 0.07 s 0.5s 
P12 190400 | 6400 | 0 0 0 | x0s =0s =0s 3.85 
P13 | 185600 | 6400 | 0 4800 | 4800 | 29min 33s  16min5s |1min25s 47min 8s 
P14 187200 | 6400 |0 3200 | 3200 | 26min 58s | 25 min 26s | 11 min 53s | 53min 40s 
P15 | 188800 | 8000 | 0 1600 | 3200 |33 min 30s | 33 min 55s | 1min 35s | 69min 6s 
P16 |185600 | 6400 | 0 4800 | 4800 | 26min 41s 32min55s/1min32s | 61min15s 
P17 | 185600 | 1600 | 0 17600 | 12800 | 33min 25s | 83 min 59s | 3 min 57s | 121 min 28s 


Results in Table 3 indicate that most of the DDG nodes in these benchmark pro- 
grams are either RUD or SID, and almost all of them can be quickly deduced by our type 
system. It explains why our new method is more efficient than the original SMT-based 
approach. Indeed, the original SMT-based approach spent a large amount of time on 
the static analysis part, which does code partitioning and applies the heuristic rules (cf. 
Sect. 4.1), whereas our method spent more time on computing the semd function. 

Column 4 shows that, at least in these benchmark programs, Boolean constants are 
rare. Columns 5-6 show that, if our refined type system fails to prove perfect masking, 
it is usually not perfectly masked. Columns 7—9 show that, in our integrated method, 
most of the time is actually used to compute semd and don’t care variables (SAT), while 
the time taken by the SMT solver to conduct model counting (SAT#) is relatively small. 


6 Related Work 


Many masking countermeasures [15,17,34,37,41,43,46,48,50-52] have been pub- 
lished over the years: although they differ in adversary models, cryptographic algo- 
rithms and compactness, a common problem is the lack of efficient tools to formally 
prove their correctness [21,22]. Our work aims to bridge the gap. It differs from 
simulation-based techniques [3,33,53] which aim to detect leaks only as opposed to 
prove their absence. It also differs from techniques designed for other types of side 
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channels such as timing [2,38], fault [12,29] and cache [24,35,40], or computing secu- 
rity bounds for probabilistic countermeasures against remote attacks [45]. 

Although some verification tools have been developed for this application [6,7, 10, 
13, 14,20, 26, 27,47], they are either fast but inaccurate (e.g., type-inference techniques) 
or accurate but slow (e.g., model-counting techniques). For example, Bayrak et al. [10] 
developed a leak detector that checks if a computation result is logically dependent of 
the secret and, at the same time, logically independent of any random variable. It is 
fast but not accurate in that many leaky nodes could be incorrectly proved [26,27]. In 
contrast, the model-counting based method proposed by Eldib et al. [26-28] is accurate, 
but also significantly less scalable because the size of logical formulas they need to 
build are exponential in the number of random variables. Moreover, for higher-order 
masking, their method is still not complete. 

Our gradual refinement of a set of semantic type inference rules were inspired by 
recent work on proving probabilistic non-interference [6,47], which exploit the unique 
characteristics of invertible operations. Similar ideas were explored in [7,14,20] as 
well. However, these prior techniques differ significantly from our method because 
their type-inference rules are syntactic and fixed, whereas ours are semantic and refined 
based on SMT solver based analysis (SAT and SAT#). In terms of accuracy, numerous 
unknowns occurred in the experimental results of [47] and two obviously perfect mask- 
ing cases were not proved in [6]. Finally, although higher-order masking were addressed 
by prior techniques [13], they were limited to linear operations, whereas our method can 
handle both first-order and higher-order masking with non-linear operations. 

An alternative way to address the model-counting problem [4, 18, 19,32] is to use 
satisfiability modulo counting, which is a generalization of the satisfiability problem of 
SMT extended with counting constraints [31]. Toward this end, Fredrikson and Jha [31] 
have developed an efficient decision procedure for linear integer arithmetic (LIA) based 
on Barvinok’s algorithm [8] and also applied their approach to differential privacy. 

Another related line of research is automatically synthesizing countermeasures [1, 
7,9, 16,25,44,54] as opposed to verifying them. While methods in [1,7,9,44] rely on 
compiler-like pattern matching, the ones in [16,25,54] use inductive program synthesis 
based on the SMT approach. These emerging techniques, however, are orthogonal to our 
work reported in this paper. It would be interesting to investigate whether our approach 
could aid in the synthesis of masking countermeasures. 


7 Conclusions and Future Work 


We have presented a refinement based method for proving that a piece of crypto- 
graphic software code is free of power side-channel leaks. Our method relies on a set of 
semantic inference rules to reason about distribution types of intermediate computation 
results, coupled with an SMT solver based procedure for gradually refining these types 
to increase accuracy. We have implemented our method and demonstrated its efficiency 
and effectiveness on cryptographic benchmarks. Our results show that it outperforms 
state-of-the-art techniques in terms of both efficiency and accuracy. 

For future work, we plan to evaluate our type inference systems for higher-order 
masking, extend it to handle integer programs as opposed to bit-blasting them to 
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Boolean programs, e.g., using satisfiability modulo counting [31], and investigate the 
synthesis of masking countermeasures based on our new verification method. 
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Abstract. Given a model and a specification, the fundamental model- 
checking problem asks for algorithmic verification of whether the model 
satisfies the specification. We consider graphs and Markov decision pro- 
cesses (MDPs), which are fundamental models for reactive systems. One 
of the very basic specifications that arise in verification of reactive sys- 
tems is the strong fairness (aka Streett) objective. Given different types 
of requests and corresponding grants, the objective requires that for each 
type, if the request event happens infinitely often, then the corresponding 
grant event must also happen infinitely often. All w-regular objectives 
can be expressed as Streett objectives and hence they are canonical in 
verification. To handle the state-space explosion, symbolic algorithms are 
required that operate on a succinct implicit representation of the system 
rather than explicitly accessing the system. While explicit algorithms for 
graphs and MDPs with Streett objectives have been widely studied, there 
has been no improvement of the basic symbolic algorithms. The worst- 
case numbers of symbolic steps required for the basic symbolic algorithms 
are as follows: quadratic for graphs and cubic for MDPs. In this work 
we present the first sub-quadratic symbolic algorithm for graphs with 
Streett objectives, and our algorithm is sub-quadratic even for MDPs. 
Based on our algorithmic insights we present an implementation of the 
new symbolic approach and show that it improves the existing approach 
on several academic benchmark examples. 


1 Introduction 


In this work we present faster symbolic algorithms for graphs and Markov deci- 
sion processes (MDPs) with strong fairness objectives. For the fundamental 
model-checking problem, the input consists of a model and a specification, and 
the algorithmic verification problem is to check whether the model satisfies the 
specification. We first describe the specific model-checking problem we consider 
and then our contributions. 
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Models: Graphs and MDPs. Two standard models for reactive systems are graphs 
and Markov decision processes (MDPs). Vertices of a graph represent states 
of a reactive system, edges represent transitions of the system, and infinite 
paths of the graph represent non-terminating trajectories of the reactive sys- 
tem. MDPs extend graphs with probabilistic transitions that represent reactive 
systems with uncertainty. Thus graphs and MDPs are the de-facto model of reac- 
tive systems with nondeterminism, and nondeterminism with stochastic aspects, 
respectively [3,19]. 


Specification: Strong Fairness (aka Streett) Objectives. A basic and fundamental 
property in the analysis of reactive systems is the strong fairness condition, 
which informally requires that if events are enabled infinitely often, then they 
must be executed infinitely often. More precisely, the strong fairness conditions 
(aka Streett objectives) consist of k types of requests and corresponding grants, 
and the objective requires that for each type if the request happens infinitely 
often, then the corresponding grant must also happen infinitely often. After 
safety, reachability, and liveness, the strong fairness condition is one of the most 
standard properties that arise in the analysis of reactive systems, and chapters 
of standard textbooks in verification are devoted to it (e.g., [19, Chap. 3.3], [82, 
Chap. 3], [2, Chaps. 8, 10]). Moreover, all w-regular objectives can be described 
by Streett objectives, e.g., LTL formulas and non-deterministic w-automata can 
be translated to deterministic Streett automata [34] and efficient translation has 
been an active research area [16,23,28]. Thus Streett objectives are a canonical 
class of objectives that arise in verification. 


Satisfaction. The basic notions of satisfaction for graphs and MDPs are as follows: 
For graphs the notion of satisfaction requires that there is a trajectory (infinite 
path) that belongs to the set of paths described by the Streett objective. For 
MDPs the satisfaction requires that there is a policy to resolve the nondetermin- 
ism such that the Streett objective is ensured almost-surely (with probability 1). 
Thus the algorithmic model-checking problem of graphs and MDPs with Streett 
objectives is a core problem in verification. 


Explicit vs Symbolic Algorithms. The traditional algorithmic studies consider 
explicit algorithms that operate on the explicit representation of the system. In 
contrast, implicit or symbolic algorithms only use a set of predefined operations 
and do not explicitly access the system [20]. The significance of symbolic algo- 
rithms in verification is as follows: to combat the state-space explosion, large 
systems must be succinctly represented implicitly and then symbolic algorithms 
are scalable, whereas explicit algorithms do not scale as it is computationally 
too expensive to even explicitly construct the system. 


Relevance. In this work we study symbolic algorithms for graphs and MDPs 
with Streett objectives. Symbolic algorithms for the analysis of graphs and 
MDPs are at the heart of many state-of-the-art tools such as SPIN, NuSMV 
for graphs [18,27] and PRISM, LiQuor, Storm for MDPs [17,22,29]. Our con- 
tributions are related to the algorithmic complexity of graphs and MDPs with 
Streett objectives for symbolic algorithms. We first present previous results and 
then our contributions. 
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Previous Results. The most basic algorithm for the problem for graphs is based 
on repeated SCC (strongly connected component) computation, and informally 
can be described as follows: for a given SCC, (a) if for every request type that 
is present in the SCC the corresponding grant type is also present in the SCC, 
then the SCC is identified as “good”, (b) else vertices of each request type that 
has no corresponding grant type in the SCC are removed, and the algorithm 
recursively proceeds on the remaining graph. Finally, reachability to good SCCs 
is computed. The current best-known symbolic algorithm for SCC computation 
requires O(n) symbolic steps, for graphs with n vertices [25], and moreover, the 
algorithm is optimal [15]. For MDPs, the SCC computation has to be replaced 
by MEC (maximal end-component) computation, and the current best-known 
symbolic algorithm for MEC computation requires O(n?) symbolic steps. While 
there have been several explicit algorithms for graphs with Streett objectives [12, 
26], MEC computation [8-10], and MDPs with Streett objectives [7], as well 
as symbolic algorithms for MDPs with Biichi objectives [11], the current best- 
known bounds for symbolic algorithms with Streett objectives are obtained from 
the basic algorithms, which are O(n-min(n,k)) for graphs and O(n? - min(n, k)) 
for MDPs, where k is the number of types of request-grant pairs. 


Our Contributions. In this work our main contributions are as follows: 


— We present a symbolic algorithm that requires O(n- /m log n) symbolic steps, 
both for graphs and MDPs, where m is the number of edges. In the case 
k = O(n), the previous worst-case bounds are quadratic (O(n”)) for graphs 
and cubic (O(n°)) for MDPs. In contrast, we present the first sub-quadratic 
symbolic algorithm both for graphs as well as MDPs. Moreover, in practice, 
since most graphs are sparse (with m = O(n)), the worst-case bounds of our 
symbolic algorithm in these cases are O(n + /nlogn). Another interesting 
contribution of our work is that we also present an O(n - ym) symbolic steps 
algorithm for MEC decomposition, which is relevant for our results as well 
as of independent interest, as MEC decomposition is used in many other 
algorithmic problems related to MDPs. Our results are summarized in Table 1. 

— While our main contribution is theoretical, based on the algorithmic insights 
we also present a new symbolic algorithm implementation for graphs and 
MDPs with Streett objectives. We show that the new algorithm improves (by 
around 30%) the basic algorithm on several academic benchmark examples 
from the VLTS benchmark suite [21]. 


Technical Contributions. The two key technical contributions of our work are as 
follows: 


— Symbolic Lock Step Search: We search for newly emerged SCCs by a local 
graph exploration around vertices that lost adjacent edges. In order to find 
small new SCCs first, all searches are conducted “in parallel”, i.e., in lock- 
step, and the searches stop as soon as the first one finishes successfully. This 
approach has successfully been used to improve explicit algorithms [7,9, 14, 26]. 
Our contribution is a non-trivial symbolic variant (Sect.3) which lies at the 
core of the theoretical improvements. 
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Table 1. Symbolic algorithms for Streett objectives and MEC decomposition. 


Problem Symbolic operations 

Basic algorithm | Improved algorithm | Reference 
Graphs with Streett |O(n-min(n,k)) |O(n/mlogn) Theorem 2 
MDPs with Streett | O(n? - min(n,k)) | O(n/mlogn) Theorem 4 
MEC decomposition O(n”) O(n\/m) Theorem 3 


— Symbolic Interleaved MEC Computation: For MDPs the identification of ver- 
tices that have to be removed can be interleaved with the computation of 
MECs such that in each iteration the computation of SCCs instead of MECs 
is sufficient to make progress [7]. We present a symbolic variant of this inter- 
leaved computation. This interleaved MEC computation is the basis for apply- 
ing the lock-step search to MDPs. 


2 Definitions 


2.1 Basic Problem Definitions 


Markov Decision Processes (MDPs) and Graphs. An MDP P = ((V, E), (Vi, Vr), 
ô) consists of a finite directed graph G = (V, E) with a set of n vertices V and a 
set of m edges E, a partition of the vertices into player 1 vertices Vı and random 
vertices Vp, and a probabilistic transition function ô. We call an edge (u, v) with 
u € Vı a player 1 edge and an edge (v, w) with v € Ve a random edge. For v € V 
we define In(v) = {w € V | (w,v) € E} and Out(v) = {w € V | (v,w) € E}. The 
probabilistic transition function is a function from Vg to D(V), where D(V) is 
the set of probability distributions over V and a random edge (v, w) € E if and 
only if 6(v)[w] > 0. Graphs are a special case of MDPs with Vr = 0. 


Plays and Strategies. A play or infinite path in P is an infinite sequence w = 
(vo, U1, U2,---) such that (vi, vi+1) E€ E for all ¿į € N; we denote by Q the set 
of all plays. A player 1 strategy o : V* - Vi — V is a function that assigns 
to every finite prefix w € V*- Vı of a play that ends in a player 1 vertex v a 
successor vertex a(w) € V such that (v,a(w)) € E; we denote by X the set of 
all player 1 strategies. A strategy is memoryless if we have o(w) = o(w’) for any 
w,w’ € V*-V, that end in the same vertex v € Vj. 


Objectives. An objective ġ is a subset of 2 said to be winning for player 1. We 
say that a play w € 22 satisfies the objective if w € ¢. For a vertex set T C V 
the reachability objective is the set of infinite paths that contain a vertex of T, 
i.e., Reach(T) = {(vo, v1, v2,...) E€ Q | Ij > 0: vj € T}. Let Inf(w) for w € 2 
denote the set of vertices that occur infinitely often in w. Given a set TP of k 
pairs (L;,U;) of vertex sets L;,U; C V with 1 < i < k, the Streett objective is 
the set of infinite paths for which it holds for each 1 < i < k that whenever a 
vertex of L; occurs infinitely often, then a vertex of U; occurs infinitely often, i.e., 
Streett(TP) = {w € 2 | Li N Inf(w) = Ø or U; NInf(w) # O for all 1 <i < k}. 


182 K. Chatterjee et al. 


Almost-Sure Winning Sets. For any measurable set of plays A C 2 we denote 
by Pr? (A) the probability that a play starting at v € V belongs to A when 
player 1 plays strategy o. A strategy o is almost-sure (a.s.) winning from a 
vertex v € V for an objective ¢ if Pr? (¢) = 1. The almost-sure winning set 
(1) ,,(P,¢) of player 1 is the set of vertices for which player 1 has an almost- 
sure winning strategy. In graphs the existence of an almost-sure winning strategy 
corresponds to the existence of a play in the objective, and the set of vertices 
for which player 1 has an (almost-sure) winning strategy is called the winning 
set (1) (P, ġ) of player 1. 


Symbolic Encoding of MDPs. Symbolic algorithms operate on sets of vertices, 
which are usually described by Binary Decision Diagrams (BDDs) [1,30]. In par- 
ticular Ordered Binary Decision Diagrams [6] (OBDDs) provide a canonical sym- 
bolic representation of Boolean functions. For the computation of almost-sure 
winning sets of MDPs it is sufficient to encode MDPs with OBDDs and one 
additional bit that denotes whether a vertex is in V; or Vp. 


Symbolic Steps. One symbolic step corresponds to one primitive operation as 
supported by standard symbolic packages like CUDD [35]. In this paper we only 
allow the same basic set-based symbolic operations as in [5,11,24,33], namely set 
operations and the following one-step symbolic operations for a set of vertices Z: 
(a) the one-step predecessor operator Pre(Z) = {v € V | Out(v)NZ F Ø}; (b) the 
one-step successor operator Post(Z) = {v € V | In(v) NZ Æ Ø}; and (c) the 
one-step controllable predecessor operator CPrer(Z) = {v € Vi | Out(v) C Z}U 
{v € Vr | Out(v) A Z Æ Ü}; i.e., the CPreg operator computes all vertices such 
that the successor belongs to Z with positive probability. This operator can be 
defined using the Pre operator and basic set operations as follows: CPrer(Z) = 
Pre(Z)\(Vi N Pre(V\Z)). We additionally allow cardinality computation and 
picking an arbitrary vertex from a set as in [11]. 


Symbolic Model. Informally, a symbolic algorithm does not operate on explicit 
representation of the transition function of a graph, but instead accesses it 
through Pre and Post operations. For explicit algorithms, a Pre/Post operation 
on a set of vertices (resp., a single vertex) requires O(m) (resp., the order of inde- 
gree /outdegree of the vertex) time. In contrast, for symbolic algorithms Pre/Post 
operations are considered unit-cost. Thus an interesting algorithmic question is 
whether better algorithmic bounds can be obtained considering Pre/Post as unit 
operations. Moreover, the basic set operations are computationally less expen- 
sive (as they encode the relationship between the state variables) compared to 
the Pre/Post symbolic operations (as they encode the transitions and thus the 
relationship between the present and the next-state variables). In all presented 
algorithms, the number of set operations is asymptotically at most the number 
of Pre/Post operations. Hence in the sequel we focus on the number of Pre/Post 
operations of algorithms. 


Algorithmic Problem. Given an MDP P (resp. a graph G) and a set of 
Streett pairs TP, the problem we consider asks for a symbolic algorithm to 
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compute the almost-sure winning set (1), (P,Streett(TP)) (resp. the winning 
set (1) (G, Streett(TP))), which is also called the qualitative analysis of MDPs 
(resp. graphs). 


2.2 Basic Concepts Related to Algorithmic Solution 


Reachability. For a graph G = (V, E) and a set of vertices S C V the set 
GRAPHREACH(G,S) is the set of vertices of V that can reach a vertex of S 
within G, and it can be identified with at most |GRAPHREACH(G,S)\S| + 1 
many Pre operations. 


Strongly Connected Components. For a set of vertices S C V we denote by 
G[S] = (S, EN(S x S)) the subgraph of the graph G induced by the vertices of S. 
An induced subgraph G[S] is strongly connected if there exists a path in G[S] 
between every pair of vertices of S. A strongly connected component (SCC) of G 
is a set of vertices C C V such that the induced subgraph G[C] is strongly 
connected and C is a maximal set in V with this property. We call an SCC 
trivial if it only contains a single vertex and no edges; and non-trivial otherwise. 
The SCCs of G partition its vertices and can be found in O(n) symbolic steps [25]. 
A bottom SCC C in a directed graph G is an SCC with no edges from vertices 
of C to vertices of V\C, i.e., an SCC without outgoing edges. Analogously, a 
top SCC C is an SCC with no incoming edges from V\C. For more intuition for 
bottom and top SCCs, consider the graph in which each SCC is contracted into 
a single vertex (ignoring edges within an SCC). In the resulting directed acyclic 
graph the sinks represent the bottom SCCs and the sources represent the top 
SCCs. Note that every graph has at least one bottom and at least one top SCC. 
If the graph is not strongly connected, then there exist at least one top and at 
least one bottom SCC that are disjoint and thus one of them contains at most 
half of the vertices of G. 


Random Attractors. In an MDP P the random attractor Attrey(P,W) of a set 
of vertices W is defined as Atirr(P,W) = Ujs9 Zj where Zo = W and 2541 = 
Zj U CPrer(Z;) for all j > 0. The attractor can be computed with at most 
|Atirr(P,W)\W| +1 many CPrep operations. 


Maximal End-Components. Let X be a vertex set without outgoing random 
edges, i.e., with Out(v) C X for all v € X N Vr. A sub-MDP of an MDP P 
induced by a vertex set X C V without outgoing random edges is defined as 
P[X] = ((X, EN(X x X), (ViN X, VRON X), ô). Note that the requirement that X 
has no outgoing random edges is necessary in order to use the same probabilistic 
transition function 6. An end-component (EC) of an MDP P is a set of vertices 
X CV such that (a) X has no outgoing random edges, i.e., P[X] is a valid sub- 
MDP, (b) the induced sub-MDP P(X] is strongly connected, and (c) P[X] con- 
tains at least one edge. Intuitively, an end-component is a set of vertices for which 
player 1 can ensure that the play stays within the set and almost-surely reaches 
all the vertices in the set (infinitely often). An end-component is a maximal 
end-component (MEC) if it is maximal under set inclusion. An end-component 
is trivial if it consists of a single vertex (with a self-loop), otherwise it is non- 
trivial. The MEC decomposition of an MDP consists of all MECs of the MDP. 
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Good End-Components. All algorithms for MDPs with Streett objectives are 
based on finding good end-components, defined below. Given the union of all 
good end-components, the almost-sure winning set is obtained by computing the 
almost-sure winning set for the reachability objective with the union of all good 
end-components as the target set. The correctness of this approach is shown in 
[7,31] (see also [3, Chap. 10.6.3]). For Streett objectives a good end-component is 
defined as follows. In the special case of graphs they are called good components. 


Definition 1 (Good end-component). Given an MDP P and a set TP = 
{(L;,U;) | 1 < j < k} of target pairs, a good end-component is an end- 
component X of P such that for each1 < j < k either L;AX =0 or UAX FO. 
A mazimal good end-component is a good end-component that is maximal with 
respect to set inclusion. 


Lemma 1 (Correctness of Computing Good End-Components [31, 
Corollary 2.6.5, Proposition 2.6.9]). For an MDP P and a set TP of 
target pairs, let X be the set of all maximal good end-components. Then 
(1) as (P, Reach(Uxey X)) is equal to (1) ,, (P, Streett(TP)). 


Iterative Vertex Removal. All the algorithms for Streett objectives maintain ver- 
tex sets that are candidates for good end-components. For such a vertex set S 
we (a) refine the maintained sets according to the SCC decomposition of P[S] 
and (b) for a set of vertices W for which we know that it cannot be contained in 
a good end-component, we remove its random attractor from S. The following 
lemma shows the correctness of these operations. 


Lemma 2 (Correctness of Vertex Removal [31, Lemma 2.6.10]). Given 
an MDP P = ((V, E), (Wi, Vr), ô), let X be an end-component with X C S for 
some S CV. Then 


(a) X CC for one SCC C of P|S] and 
(b) X C S\Atire(P’,W) for each W C V\X and each sub-MDP P’ contain- 
ing X. 


Let X be a good end-component. Then X is an end-component and for each 
index j, XN U; = 0 implies X N L; = Ø. Hence we obtain the following corollary. 


Corollary 1 ((31, Corollary 4.2.2]). Given an MDP P, let X be a good end- 
component with X C S for some S CV. For each i with SOU; = 9 it holds that 
X C S\Attrr(P[S], Li N S). 


For an index j with SMU; = Ø we call the vertices of S N L; bad vertices. 


The set of all bad vertices BAD(S) = Uj, e;<,{v € Lin S | UNS = Ø} can be 
computed with 2k set operations. T 


3 Symbolic Divide-and-Conquer with Lock-Step Search 


In this section we present a symbolic version of the lock-step search for strongly 
connected subgraphs [26]. This symbolic version is used in all subsequent results, 
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i.e., the sub-quadratic symbolic algorithms for graphs and MDPs with Streett 
objectives, and for MEC decomposition. 


Divide-and-Conquer. The common property of the algorithmic problems we con- 
sider in this work is that the goal is to identify subgraphs of the input graph 
G = (V,E) that are strongly connected and satisfy some additional proper- 
ties. The difference between the problems lies in the required additional proper- 
ties. We describe and analyze the Procedure LOCK-STEP-SEARCH that we use 
in all our improved algorithms to efficiently implement a divide-and-conquer 
approach based on the requirement of strong connectivity, that is, we divide 
a subgraph G[S], induced by a set of vertices S, into two parts that are not 
strongly connected within G[S] or detect that G[S] is strongly connected. 


Start Vertices of Searches. The input to Procedure LOCK-STEP-SEARCH is a 
set of vertices S C V and two subsets of S denoted by Hg and Ty. In the 
algorithms that call the procedure as a subroutine, vertices contained in Hg 
have lost incoming edges (i.e., they were a “head” of a lost edge) and vertices 
contained in Ts have lost outgoing edges (i.e., they were a “tail” of a lost edge) 
since the last time a superset of S was identified as being strongly connected. For 
each vertex h of Hg the procedure conducts a backward search (i.e., a sequence 
of Pre operations) within G[S] to find the vertices of S that can reach h; and 
analogously a forward search (i.e., a sequence of Post operations) from each 
vertex t of Ts is conducted. 


Intuition for the Choice of Start Vertices. If the subgraph G[S] is not strongly 
connected, then it contains at least one top SCC and at least one bottom SCC 
that are disjoint. Further, if for a superset S’ D S the subgraph G[S’] was 
strongly connected, then each top SCC of G[S] contains a vertex that had an 
additional incoming edge in G[S’] compared to G[S], and analogously each bot- 
tom SCC of G[S] contains a vertex that had an additional outgoing edge. Thus by 
keeping track of the vertices that lost incoming or outgoing edges, the following 
invariant will be maintained by all our improved algorithms. 


Invariant 1 (Start Vertices Sufficient). We have Hs,Ts C S. Hither (a) 
Hs UTs = 0 and G[S] is strongly connected or (b) at least one vertex of each 
top SCC of G[S] is contained in Hg and at least one vertex of each bottom SCC 
of G[S] is contained in Ts. 


Lock-Step Search. The searches from the vertices of Hs U Tg are performed in 
lock-step, that is, (a) one step is performed in each of the searches before the 
next step of any search is done and (b) all searches stop as soon as the first of 
the searches finishes. This is implemented in Procedure LOCK-STEP-SEARCH as 
follows. A step in the search from a vertex t € Ts (and analogously for h € Hs) 
corresponds to the execution of the iteration of the for-each loop for t € Tg. In 
an iteration of a for-each loop we might discover that we do not need to consider 
this search further (see the paragraph on ensuring strong connectivity below) 
and update the set Ts (via T4) for future iterations accordingly. Otherwise the 
set C+ is either strictly increasing in this step of the search or the search for t 
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Procedure. LOCK-STEP-SEARCH(G, S, Hs, Ts) 


/* Pre and Post defined w.r.t. to G */ 
1 foreach v € Hs UTs do C, — {v} 
2 while true do 
3 H5 — Hs, T4 — Ts 
4 foreach h € Hs do /* search for top SCC */ 
5 Cp — (Crh U Pre(Cr)) NO S 
6 if |C, 0 Hs| > 1 then Hg — Hg\{h} 
7 else 
8 if C, = Cn then return (Ch, H5, T$) 
9 | Ch — Ci, 
10 foreach t € Ts do /* search for bottom SCC */ 
11 Ci — (Ci U Post(C4)) O S 
12 if |C: O Ts| > 1 then Ts — T4\{t} 
13 else 
14 if Ci = C: then return (Ci, H5, Ts) 
15 | C; — Ci 
16 Hs — Hg, Ts — T4 


terminates and we return the set of vertices in G[S] that are reachable from t. 
So the two for-each loops over the vertices of Tg and Hg that are executed in 
an iteration of the while-loop perform one step of each of the searches and the 
while-loop stops as soon as a search stops, i.e., a return statement is executed 
and hence this implements properties (a) and (b) of lock-step search. Note that 
the while-loop terminates, i.e., a return statement is executed eventually because 
for all t € Ts (and resp. for all h € Hs) the sets C; are monotonically increasing 
over the iterations of the while-loop, we have C; C S, and if some set C; does 
not increase in an iteration, then it is either removed from Ts and thus not 
considered further or a return statement is executed. Note that when a search 
from a vertex t € Ts stops, it has discovered a maximal set of vertices C that can 
be reached from t; and analogously for h € Hg. Figure 1 shows a small intuitive 
example of a call to the procedure. 


Comparison to Explicit Algorithm. In the explicit version of the algorithm [7,26] 
the search from vertex t € Ts performs a depth-first search that terminates 
exactly when every edge reachable from t is explored. Since any search that 
starts outside of a bottom SCC but reaches the bottom SCC has to explore 
more edges than the search started inside of the bottom SCC, the first search 
from a vertex of Ts that terminates has exactly explored (one of) the smallest 
(in the number of edges) bottom SCC(s) of G[S]. Thus on explicit graphs the 
explicit lock-step search from the vertices of Hs U Ts finds (one of) the smallest 
(in the number of edges) top or bottom SCC(s) of G[S] in time proportional 
to the number of searches times the number of edges in the identified SCC. In 
symbolically represented graphs it can happen (1) that a search started outside 
of a bottom (resp. top) SCC terminates earlier than the search started within 
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Fig. 1. An example of symbolic lock-step search showing the first three iterations of 
the main while-loop. Note that during the second iteration, the search started from tı 
is disregarded since it collides with t2. In the subsequent fourth iteration, the search 
started from tə is returned by the procedure. 


the bottom (resp. top) SCC and (2) that a search started in a larger (in the 
number of vertices) top or bottom SCC terminates before one in a smaller top 
or bottom SCC. We discuss next how we address these two challenges. 


Ensuring Strong Connectivity. First, we would like the set returned by Procedure 
LOCK-STEP-SEARCH to indeed be a top or bottom SCC of G[S]. For this we use 
the following observation for bottom SCCs that can be applied to top SCCs 
analogously. If a search starting from a vertex of tı € Ts encounters another 
vertex t2 € Ts, tı Æ t2, there are two possibilities: either (1) both vertices are in 
the same SSC or (2) tı can reach tz but not vice versa. In Case (1) the searches 
from both vertices can explore all vertices in the SCC and thus it is sufficient 
to only search from one of them. In Case (2) the SCC of tı has an outgoing 
edge and thus cannot be a bottom SCC. Hence in both cases we can remove the 
vertex tı from the set Ts while still maintaining Invariant 1. By Invariant 1 we 
further have that each search from a vertex of Ts that is not in a bottom SCC 
encounters another vertex of Ts in its search and therefore is removed from the 
set Ts during Procedure LOCK-STEP-SEARCH (if no top or bottom SCC is found 
earlier). This ensures that the returned set is either a top or a bottom SCC.1 


Bound on Symbolic Steps. Second, observe that we can still bound the number 
of symbolic steps needed for the search that terminates first by the number 
of vertices in the smallest top or bottom SCC of GS], since this is an upper 
bound on the symbolic steps needed for the search started in this SCC. Thus 
provided Invariant 1, we can bound the number of symbolic steps in Procedure 
LOCK-STEP-SEARCH to identify a vertex set C Ç S such that C and $\C are 
not strongly connected in G[S] by O((|Hs| + |Ts|) - min(|C],|S\C]|)). In the 
algorithms that call Procedure LOCK-STEP-SEARCH we charge the number of 
symbolic steps in the procedure to the vertices in the smaller set of C and $\C; 
this ensures that each vertex is charged at most O(log n) times over the whole 
algorithm. We obtain the following result (proof in [13, Appendix A]). 


1 To improve the practical performance, we return the updated sets Hs and Ts. By 
the above argument this preserves Invariant 1. 


188 K. Chatterjee et al. 


Theorem 1 (Lock-Step Search). Provided Invariant1 holds, Procedure 
LOCK-S'TEP-SEARCH (G, S, Hs, Ts) returns a top or bottom SCC C of 
G[S]. It uses O((|Hgs| + |Ts|) - min(|C],|S'\C|)) symbolic steps if C + S and 
O((|Hs| + |Ts|) - |C|) otherwise. 


4 Graphs with Streett Objectives 


Basic Symbolic Algorithm. Recall that for a given graph (with n vertices) 
and a Streett objective (with k target pairs) each non-trivial strongly connected 
subgraph without bad vertices is a good component. The basic symbolic algo- 
rithm for graphs with Streett objectives repeatedly removes bad vertices from 
each SCC and then recomputes the SCCs until all good components are found. 
The winning set then consists of the vertices that can reach a good component. 
We refer to this algorithm as STREETTGRAPHBASIC. For the pseudocode and 
more details see [13, Appendix B]. 


Proposition 1. Algorithm STREETTGRAPHBASIC correctly computes the win- 
ning set in graphs with Streett objectives and requires O(n - min(n,k)) symbolic 
steps. 


Improved Symbolic Algorithm. In our improved symbolic algorithm we 
replace the recomputation of all SCCs with the search for a new top or bottom 
SCC with Procedure LOCK-STEP-SEARCH from vertices that have lost adjacent 
edges whenever there are not too many such vertices. We present the improved 
symbolic algorithm for graphs with Streett objectives in more detail as it also 
conveys important intuition for the MDP case. The pseudocode is given in Algo- 
rithm STREETTGRAPHIMPR. 


Iterative Refinement of Candidate Sets. The improved algorithm maintains a 
set goodC of already identified good components that is initially empty and a 
set X of candidates for good components that is initialized with the SCCs of the 
input graph G. The difference to the basic algorithm lies in the properties of the 
vertex sets maintained in ¥ and the way we identify sets that can be separated 
from each other without destroying a good component. In each iteration one 
vertex set S' is removed from ¥ and, after the removal of bad vertices from the 
set, either identified as a good component or split into several candidate sets. By 
Lemma 2 and Corollary 1 the following invariant is maintained throughout the 
algorithm for the sets in goodC and &. 


Invariant 2 (Maintained Sets). The sets in X UgoodC are pairwise disjoint 
and for every good component C of G there exists a set Y D C such that either 
Y € X or Y € goodC. 


Lost Adjacent Edges. In contrast to the basic algorithm, the subgraph induced 
by a set S contained in ¥ is not necessarily strongly connected. Instead, we 
remember vertices of S that have lost adjacent edges since the last time a superset 
of S was determined to induce a strongly connected subgraph; vertices that lost 
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Algorithm. STREETTGRAPHIMPR. Improved Alg. for Graphs with Streett 
Obj. 
Input :graph G = (V, E) and Streett pairs TP = {(Li,U;) | 1<i< k} 
Output: (1) (G,Streett(TP)) 
1 X — ALLSCCS(G); goodC + Ø 


2 foreach C € X do Hc - Q; To — Ó 
3 while ¥ 4 do 


4 remove some S € X from ¥ 

5 Be Ur <i<ru,ns—o (Li NS) 

6 while B 4 Ø do 

7 S<—S\B 

8 Hs — (Hs U Post(B))N S 

9 Ts — (Ts U Pre(B))N S 
10 E B — Ui<i<r:u;ns=0(Li N S) 

11 if Post(S) N S # Ú then /* G[S] contains at least one edge */ 
12 if |Hs| + |Ts| = 0 then goodC — goodC U {9} 

13 else if |Hs| + |Ts| > ./m/logn then 

14 delete Hs and Ts 

15 C — ALLSCCs(G[S]) 

16 if |C| = 1 then goodC — goodCU {S} 

17 else 

18 foreach C € C do Hc — ģ; To — Ø 

19 | X XUC 
20 else 
21 (C, Hs, Ts) — LOCK-STEP-SEARCH (G, S, Hs, Ts) 
22 if C = S then goodC + goodCU {9} 

23 else /* separate C and S\C */ 
24 S—S\C 

25 Ho + b; To — 0 

26 Hs — (Hs U Post(C)) NS 

27 Ts — (Ts UPre(C)) NA S 

28 X — XU{S}U{C} 


29 return GRAPHREACH(G, Ucegooac C) 


incoming edges are contained in Hg and vertices that lost outgoing edges are 
contained in Tg. In this way we maintain Invariant 1 throughout the algorithm, 
which enables us to use Procedure LOCK-STEP-SEARCH with the running time 
guarantee provided by Theorem 1. 


Identifying SCCs. Let S be the vertex set removed from ¥ in a fixed iteration of 
Algorithm STREETTGRAPHIMPR after the removal of bad vertices in the inner 
while-loop. First note that if S is strongly connected and contains at least one 
edge, then it is a good component. If the set S was already identified as strongly 
connected in a previous iteration, i.e., Hs and Ts are empty, then S is identified 
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as a good component in line 12. If many vertices of © have lost adjacent edges 
since the last time a super-set of S was identified as a strongly connected sub- 
graph, then the SCCs of G[S] are determined as in the basic algorithm. To 
achieve the optimal asymptotic upper bound, we say that many vertices of S 
have lost adjacent edges when we have |Hgs| + |Ts| > \/m/logn, while lower 
thresholds are used in our experimental results. Otherwise, if not too many ver- 
tices of S lost adjacent edges, then we start a symbolic lock-step search for top 
SCCs from the vertices of Hs and for bottom SCCs from the vertices of Tg using 
Procedure LOCK-STEP-SEARCH. The set returned by the procedure is either a 
top or a bottom SCC C of G[S] (Theorem 1). Therefore we can from now on 
consider C and S\C separately, maintaining Invariants 1 and 2. 


Algorithm STREETTGRAPHIMPR. A succinct description of the pseudocode is as 
follows: Lines 1-2 initialize the set of candidates for good components with the 
SCCs of the input graph. In each iteration of the main while-loop one candidate is 
considered and the following operations are performed: (a) lines 5-10 iteratively 
remove all bad vertices; if afterwards the candidate is still strongly connected 
(and contains at least one edge), it is identified as a good component in the next 
step; otherwise it is partitioned into new candidates in one of the following ways: 
(b) if many vertices lost adjacent edges, lines 13-17 partition the candidate into 
its SCCs (this corresponds to an iteration of the basic algorithm); (c) otherwise, 
lines 20-28 use symbolic lock-step search to partition the candidate into one of its 
SCCs and the remaining vertices. The while-loop terminates when no candidates 
are left. Finally, vertices that can reach some good component are returned. We 
have the following result (proof in [13, Appendix BJ). 


Theorem 2 (Improved Algorithm for Graphs). Algorithm STREETT- 
GRAPHIMPR correctly computes the winning set in graphs with Streett objectives 


and requires O(n-./mlogn) symbolic steps. 


5 Symbolic MEC Decomposition 


In this section we present a succinct description of the basic symbolic algo- 
rithm for MEC decomposition and then present the main ideas for the improved 
algorithm. 


Basic symbolic algorithm for MEC decomposition. The basic symbolic algorithm 
for MEC decomposition maintains a set of identified MECs and a set of candi- 
dates for MECs, initialized with the SCCs of the MDP. Whenever a candidate 
is considered, either (a) it is identified as a MEC or (b) it contains vertices 
with outgoing random edges, which are then removed together with their ran- 
dom attractor from the candidate, and the SCCs of the remaining sub-MDP are 
added to the set of candidates. We refer to the algorithm as MECBASIC. 


Proposition 2. Algorithm MECBasic correctly computes the MEC decomposi- 
tion of MDPs and requires O(n) symbolic steps. 
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Improved Symbolic Algorithm for MEC Decomposition. The improved symbolic 
algorithm for MEC decomposition uses the ideas of symbolic lock-step search 
presented in Sect. 3. Informally, when considering a candidate that lost a few 
edges from the remaining graph, we use the symbolic lock-step search to identify 
some bottom SCC. We refer to the algorithm as MECImpr. Since all the impor- 
tant conceptual ideas regarding the symbolic lock-step search are described in 
Sect. 3, we relegate the technical details to [13, Appendix C]. We summarize the 
main result (proof in [13, Appendix C]). 


Theorem 3 (Improved Algorithm for MEC). Algorithm MECIMPR cor- 
rectly computes the MEC decomposition of MDPs and requires O(n- \/m) sym- 
bolic steps. 


6 MODPs with Streett Objectives 


Basic Symbolic Algorithm. We refer to the basic symbolic algorithm for 
MDPs with Streett objectives as STREETTMDPBASIC, which is similar to the 
algorithm for graphs, with SCC computation replaced by MEC computation. 
The pseudocode of Algorithm STREETTMDPBasIc together with its detailed 
description is presented in [13, Appendix D]. 


Proposition 3. Algorithm STREETTMDPBASIC correctly computes the almost- 
sure winning set in MDPs with Streett objectives and requires O(n? - min(n, k)) 
symbolic steps. 


Remark. The above bound uses the basic symbolic MEC decomposition algo- 
rithm. Using our improved symbolic MEC decomposition algorithm, the above 
bound could be improved to O(n - ym - min(n, k)). 


Improved Symbolic Algorithm. We refer to the improved symbolic algorithm 
for MDPs with Streett objectives as STREETTMDPImpnr. First we present the 
main ideas for the improved symbolic algorithm. Then we explain the key dif- 
ferences compared to the improved symbolic algorithm for graphs. A thorough 
description with the technical details and proofs is presented in [13, Appendix D]. 


— First, we improve the algorithm by interleaving the symbolic MEC compu- 
tation with the detection of bad vertices [7,31]. This allows to replace the 
computation of MECs in each iteration of the while-loop with the computa- 
tion of SCCs and an additional random attractor computation. 

e Intuition of interleaved computation. Consider a candidate for a good end- 
component S after a random attractor to some bad vertices is removed 
from it. After the removal of the random attractor, the set S does not have 
random vertices with outgoing edges. Consider that further BAD(S) = 9 
holds. If S is strongly connected and contains an edge, then it is a good 
end-component. If S is not strongly connected, then P[S] contains at least 
two SCCs and some of them might have random vertices with outgoing 
edges. Since end-components are strongly connected and do not have 
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random vertices with outgoing edges, we have that (1) every good end- 
component is completely contained in one of the SCCs of P[S] and (2) 
the random vertices of an SCC with outgoing edges and their random 
attractor do not intersect with any good end-component (see Lemma 2). 
e Modification from basic to improved algorithm. We use these observations 
to modify the basic algorithm as follows: First, for the sets that are can- 
didates for good end-components, we do not maintain the property that 
they are end-components, but only that they do not have random ver- 
tices with outgoing edges (it still holds that every maximal good end- 
component is either already identified or contained in one of the candi- 
date sets). Second, for a candidate set S, we repeat the removal of bad 
vertices until BAD(S) = Ø holds before we continue with the next step of 
the algorithm. This allows us to make progress after the removal of bad 
vertices by computing all SCCs (instead of MECs) of the remaining sub- 
MDP. If there is only one SCC, then this is a good end-component (if it 
contains at least one edge). Otherwise (a) we remove from each SCC the 
set of random vertices with outgoing edges and their random attractor 
and (b) add the remaining vertices of each SCC as a new candidate set. 
— Second, as for the improved symbolic algorithm for graphs, we use the sym- 
bolic lock-step search to quickly identify a top or bottom SCC every time a 
candidate has lost a small number of edges since the last time its superset 
was identified as being strongly connected. The symbolic lock-step search is 
described in detail in Sect. 3. 


Using interleaved MEC computation and lock-step search leads to a simi- 
lar algorithmic structure for Algorithm STREETTMDPIMPR as for our improved 
symbolic algorithm for graphs (Algorithm STREETTGRAPHIMPR). The key dif- 
ferences are as follows: First, the set of candidates for good end-components 
is initialized with the MECs of the input graph instead of the SCCs. Second, 
whenever bad vertices are removed from a candidate, also their random attrac- 
tor is removed. Further, whenever a candidate is partitioned into its SCCs, for 
each SCC, the random attractor of the vertices with outgoing random edges 
is removed. Finally, whenever a candidate S is separated into C and S\C via 
symbolic lock-step search, the random attractor of the vertices with outgoing 
random edges is removed from C, and the random attractor of C is removed 
from S. 


Theorem 4 (Improved Algorithm for MDPs). Algorithm STREETT 
MDPIMPR correctly computes the almost-sure winning set in MDPs with Streett 
objectives and requires O(n - ./mlogn) symbolic steps. 


7 Experiments 


We present a basic prototype implementation of our algorithm and com- 
pare against the basic symbolic algorithm for graphs and MDPs with Streett 
objectives. 
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Models. We consider the academic benchmarks from the VLTS benchmark 
suite [21], which gives representative examples of systems with nondeterminism, 
and has been used in previous experimental evaluation (such as [4,11]). 


Specifications. We consider random LTL formulae and use the tool Rabinizer [28] 
to obtain deterministic Rabin automata. Then the negations of the formulae give 
us Streett automata, which we consider as the specifications. 


Graphs. For the models of the academic benchmarks, we first compute SCCs, 
as all algorithms for Streett objectives compute SCCs as a preprocessing step. 
For SCCs of the model benchmarks we consider products with the specification 
Streett automata, to obtain graphs with Streett objectives, which are the bench- 
mark examples for our experimental evaluation. The number of transitions in 
the benchmarks ranges from 300K to 5Million. 


MDPs. For MDPs, we consider the graphs obtained as above and consider a 
fraction of the vertices of the graph as random vertices, which is chosen uniformly 
at random. We consider 10%, 20%, and 50% of the vertices as random vertices 
for different experimental evaluation. 


Basic better: 36 KE 
Improved better: 913 koe 

I/B arit. mean: ~59,.6% Ps 
1/B geo. mean: ~35.3% Pi . 


Improved PrePost 
s 


0 20000 40000 60000 80000 100000 
Basic PrePost 


Fig. 2. Results for graphs with Streett objectives. 


Experimental Evaluation. In the experimental evaluation we compare the num- 
ber of symbolic steps (i.e., the number of Pre/Post operations?) executed by 
the algorithms, the comparison of running time yields similar results and is pro- 
vided in [13, Appendix E]. As the initial preprocessing step is the same for all the 
algorithms (computing all SCCs for graphs and all MECs for MDPs), the com- 
parison presents the number of symbolic steps executed after the preprocessing. 
The experimental results for graphs are shown in Fig. 2 and the experimental 
results for MDPs are shown in Fig.3 (in each figure the two lines represent 
equality and an order-of-magnitude improvement, respectively). 


Discussion. Note that the lock-step search is the key reason for theoretical 
improvement, however, the improvement relies on a large number of Streett pairs. 


? Recall that the basic set operations are cheaper to compute, and asymptotically at 
most the number of Pre/Post operations in all the presented algorithms. 
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Fig. 3. Results for MDPs with Streett objectives. 


In the experimental evaluation, the LTL formulae generate Streett automata 
with small number of pairs, which after the product with the model accounts for 
an even smaller fraction of pairs as compared to the size of the state space. This 
has two effects: 


— In the experiments the lock-step search is performed for a much smaller param- 
eter value (O(log n) instead of the theoretically optimal bound of /m/ log n), 
and leads to a small improvement. 

— For large graphs, since the number of pairs is small as compared to the number 
of states, the improvement over the basic algorithm is minimal. 


In contrast to graphs, in MDPs even with small number of pairs as compared 
to the state-space, the interleaved MEC computation has a notable effect on 
practical performance, and we observe performance improvement even in large 
MDPs. 


8 Conclusion 


In this work we consider symbolic algorithms for graphs and MDPs with Streett 
objectives, as well as for MEC decomposition. Our algorithmic bounds match 
for both graphs and MDPs. In contrast, while SCCs can be computed in linearly 
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many symbolic steps no such algorithm is known for MEC decomposition. An 
interesting direction of future work would be to explore further improved sym- 
bolic algorithms for MEC decomposition. Moreover, further improved symbolic 
algorithms for graphs and MDPs with Streett objectives is also an interesting 
direction of future work. 
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Abstract. Parity games have important practical applications in formal 
verification and synthesis, especially to solve the model-checking problem 
of the modal mu-calculus. They are also interesting from the theory 
perspective, because they are widely believed to admit a polynomial 
solution, but so far no such algorithm is known. 

We propose a new algorithm to solve parity games based on learning 
tangles, which are strongly connected subgraphs for which one player has 
a strategy to win all cycles in the subgraph. We argue that tangles play 
a fundamental role in the prominent parity game solving algorithms. We 
show that tangle learning is competitive in practice and the fastest solver 
for large random games. 


1 Introduction 


Parity games are turn-based games played on a finite graph. Two players Odd 
and Even play an infinite game by moving a token along the edges of the graph. 
Each vertex is labeled with a natural number priority and the winner of the 
game is determined by the parity of the highest priority that is encountered 
infinitely often. Player Odd wins if this parity is odd; otherwise, player Even 
wins. 

Parity games are interesting both for their practical applications and for 
complexity theoretic reasons. Their study has been motivated by their relation 
to many problems in formal verification and synthesis that can be reduced to the 
problem of solving parity games, as parity games capture the expressive power 
of nested least and greatest fixpoint operators [11]. In particular, deciding the 
winner of a parity game is polynomial-time equivalent to checking non-emptiness 
of non-deterministic parity tree automata [21], and to the explicit model-checking 
problem of the modal p-calculus [9, 15, 20]. 

Parity games are interesting in complexity theory, as the problem of deter- 
mining the winner of a parity game is known to lie in UP N co-UP [16], which 
is contained in NP N co-NP [9]. This problem is therefore unlikely to be NP- 
complete and it is widely believed that a polynomial solution exists. Despite 
much effort, such an algorithm has not been found yet. 
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The main contribution of this paper is based on the notion of a tangle. 
A tangle is a strongly connected subgraph of a parity game for which one of 
the players has a strategy to win all cycles in the subgraph. We propose this 
notion and its relation to dominions and cycles in a parity game. Tangles are 
related to snares [10] and quasi-dominions [3], with the critical difference that 
tangles are strongly connected, whereas snares and quasi-dominions may be 
unconnected as well as contain vertices that are not in any cycles. We argue 
that tangles play a fundamental role in various parity game algorithms, in par- 
ticular in priority promotion [3,5], Zielonka’s recursive algorithm [25], strategy 
improvement [10,11,24], small progress measures [17], and in the recently pro- 
posed quasi-polynomial time progress measures [6,12]. 

The core insight of this paper is that tangles can be used to attract sets 
of vertices at once, since the losing player is forced to escape a tangle. This 
leads to a novel algorithm to solve parity games called tangle learning, which 
is based on searching for tangles along a top-down a-maximal decomposition of 
the parity game. New tangles are then attracted in the next decomposition. This 
naturally leads to learning nested tangles and, eventually, finding dominions. We 
prove that tangle learning solves parity games and present several extensions to 
the core algorithm, including alternating tangle learning, where the two players 
take turns maximally searching for tangles in their regions, and on-the-fly tangle 
learning, where newly learned tangles immediately refine the decomposition. 

We relate the complexity of tangle learning to the number of learned tangles 
before finding a dominion, which is related to how often the solver is distracted 
by paths to higher winning priorities that are not suitable strategies. 

We evaluate tangle learning in a comparison based on the parity game solver 
Oink [7], using the benchmarks of Keiren [19] as well as random parity games 
of various sizes. We compare tangle learning to priority promotion [3,5] and to 
Zielonka’s recursive algorithm [25] as implemented in Oink. 


2 Preliminaries 


Parity games are two-player turn-based infinite-duration games over a finite 
directed graph G = (V, E), where every vertex belongs to exactly one of two 
players called player Even and player Odd, and where every vertex is assigned a 
natural number called the priority. Starting from some initial vertex, a play of 
both players is an infinite path in G where the owner of each vertex determines 
the next move. The winner of such an infinite play is determined by the parity 
of the highest priority that occurs infinitely often along the play. 

More formally, a parity game Ð is a tuple (Vo, Vg, E, pr) where V = Vo U V% 
is a set of vertices partitioned into the sets Vo controlled by player Even and Vi 
controlled by player Odd, and E C V x V is a left-total binary relation describing 
all moves, i.e., every vertex has at least one successor. We also write E(w) for 
all successors of u and u — v for v € E(u). The function pr: V > {0,1,...,d} 
assigns to each vertex a priority, where d is the highest priority in the game. 

We write pr(v) for the priority of a vertex v and pr(V) for the highest priority 
of vertices V and pr(©) for the highest priority in the game Ð. Furthermore, we 
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write pr™}(i) for all vertices with the priority i. A path m = vov: ... is a sequence 
of vertices consistent with EF, i.e., vi — vi+ı for all successive vertices. A play 
is an infinite path. We denote with inf(z) the vertices in 7 that occur infinitely 
many times in 7. Player Even wins a play 7 if pr(inf(7)) is even; player Odd wins 
if pr(inf(7)) is odd. We write Plays(v) to denote all plays starting at vertex v. 

A strategy o: V — V is a partial function that assigns to each vertex in its 
domain a single successor in FE, i.e., 9 C E. We refer to a strategy of player 
a to restrict the domain of ø to Vy. In the remainder, all strategies ø are of a 
player a. We write Plays(v,c) for the set of plays from v consistent with c, and 
Plays(V, ø) for {7 € Plays(v,a) |v € V}. 

A fundamental result for parity games is that they are memoryless deter- 
mined [8], i.e., each vertex is either winning for player Even or for player Odd, 
and both players have a strategy for their winning vertices. Player œ wins vertex 
v if they have a strategy o such that all plays in Plays(v,o) are winning for 
player a. 

Several algorithms for solving parity games employ attractor computation. 
Given a set of vertices A, the attractor of A for a player œ represents those 
vertices from which player a can force a play to visit A. We write Attr? (A) to 
attract vertices in Ð to A as player q, i.e., 


uZ.AU {v E€ Va | E(w) NA Z AOU {v € Va| E(v) C Z} 


Informally, we compute the a-attractor of A with a backward search from A, 
initially setting Z := A and iteratively adding a-vertices with a successor in Z 
and a@-vertices with no successors outside Z. We also obtain a strategy o for 
player a, starting with an empty strategy, by selecting a successor in Z when we 
attract vertices of player œ and when the backward search finds a successor in Z 
for the a-vertices in A. We call a set of vertices A a-maximal if A = Attr? (A). 
A dominion D is a set of vertices for which player a has a strategy ø such that 
all plays consistent with o stay in D and are winning for player a. We also write a 
p-dominion for a dominion where p is the highest priority encountered infinitely 
often in plays consistent with ø, i.e., p := max{pr(inf(7)) | 7 € Plays(D,o)}. 


3 Tangles 


Definition 1. A p-tangle is a nonempty set of vertices U C V with p = pr(U), 
for which player œ => p has a strategy o: Ua —> U, such that the graph (U, E’), 
with E' := EN (oU (Uz x U)), is strongly connected and player œ wins all cycles 
in (U, B’). 


Informally, a tangle is a set of vertices for which player a has a strategy to 
win all cycles inside the tangle. Thus, player @ loses all plays that stay in U and 
is therefore forced to escape the tangle. The highest priority by which player a 
wins a play in (U, E’) is p. We make several basic observations related to tangles. 
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1. A p-tangle from which player @ cannot leave is a p-dominion. 
2. Every p-dominion contains one or more p-tangles. 
3. Tangles may contain tangles of a lower priority. 


Observation 1 follows by definition. Observation 2 follows from the fact that 
dominions won by player a with some strategy o must contain strongly connected 
subgraphs where all cycles are won by player a and the highest winning priority 
is p. For observation 3, consider a p-tangle for which player @ has a strategy 
that avoids priority p while staying in the tangle. Then there is a p’-tangle with 
p’ <p in which player @ also loses. 

We can in fact find a hierarchy of tan- 


gles in any dominion D with winning strategy b c 
o by computing the set of winning priorities 
{pr(inf(7)) | m € Plays(D,o)}. There is a p- a © oS 
tangle in D for every p in this set. Tangles are COX E 
thus a natural substructure of dominions. 

See for example Fig. 1. Player Odd wins 3 
this dominion with highest priority 5 and strat- 
egy {d—e}. Player Even can also avoid pri- d e 


ority 5 and then loses with priority 3. The 
5-dominion {a,b,c,d,e} contains the 5-tangle 
{b,c,d,e} and the 3-tangle {c,e}. 


Fig.1. A 5-dominion with a 
5-tangle and a 3-tangle 


4 Solving by Learning Tangles 


Since player @ must escape tangles won by player a, we can treat a tangle as an 
abstract vertex controlled by player @ that can be attracted by player a, thus 
attracting all vertices of the tangle. This section proposes the tangle learning 
algorithm, which searches for tangles along a top-down a-maximal decomposi- 
tion of the game. We extend the attractor to attract all vertices in a tangle when 
player @ is forced to play from the tangle to the attracting set. After extracting 
new tangles from regions in the decomposition, we iteratively repeat the pro- 
cedure until a dominion is found. We show that tangle learning solves parity 
games. 


4.1 Attracting Tangles 


Given a tangle t, we denote its vertices simply by t and its witness strategy by 
or(t). We write E(t) for the edges from @-vertices in the tangle to the rest of 
the game: E(t) := {v | u > v^u E€ tA Vz^v EV \t}. We write Th for all 
tangles where pr(t) is odd (won by player Odd) and Tọ for all tangles where pr(t) 
is even. We write TAttr?" (A) to attract vertices in © and vertices of tangles in 
T to A as player a, i.e., 


uZ.AU {v E Va | E(w) N Z FOU {v € Vz | Ev) C Z} 
U{vEt]|tETa A Er(t) #0 Ert) C Z} 
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def solve(0): 
Wo 0, Wao 0, OD 0, oa 0, T 0 
while © #0: 
T,d — search(o, T) 
a <— pr(d) mod 2 
D,o — Attr? (d) 
Wa — Wa U D, Oa — Ca Uor(d) Uo 
o-O0\D,T—TNA(O\D) 
return Wo, Wa, co, oo 


o mAND Ak WN 


Algorithm 1. The solve algorithm which computes the winning regions and 
winning strategies for both players of a given parity game. 


This approach is not the same as the subset construction. Indeed, we do not 
add the tangle itself but rather add all its vertices together. Notice that this 
attractor does not guarantee arrival in A, as player @ can stay in the added 
tangle, but then player @ loses. 

To compute a witness strategy o for player a, as with Attr®, we select a 
successor in Z when attracting single vertices of player a and when we find a 
successor in Z for the a-vertices in A. When we attract vertices of a tangle, we 
update o for each tangle t sequentially, by updating o with the strategy in o7(t) 
of those a-vertices in the tangle for which we do not yet have a strategy in ø, 
i.e., {(u,v) E€ or(t) | u ¢ dom(c)}. This is important since tangles can overlap. 

In the following, we call a set of vertices A a-maximal if A = TAttr®?T (A). 
Given a game Ð and a set of vertices U, we write DNU for the subgame ©’ where 
V’ := VAU and F” := En(V’xV’). Given a set of tangles T and a set of vertices 
U, we write TAU for all tangles with all vertices in U,ie., {tE T |t C U}, and 
we extend this notation to To’ for the tangles in the game 0’, i.e., TO V’. 


4.2 The solve Algorithm 


We solve parity games by iteratively searching and removing a dominion of the 
game, as in [3,18,22]. See Algorithm 1. The search algorithm (described below) 
is given a game and a set of tangles and returns an updated set of tangles and a 
tangle d that is a dominion. Since the dominion d is a tangle, we derive the winner 
a from the highest priority (line 5) and use standard attractor computation to 
compute a dominion D (line 6). We add the dominion to the winning region 
of player a (line 7). We also update the winning strategy of player a using the 
witness strategy of the tangle d plus the strategy ø obtained during attractor 
computation. To solve the remainder, we remove all solved vertices from the 
game and we remove all tangles that contain solved vertices (line 8). When the 
entire game is solved, we return the winning regions and winning strategies of 
both players (lines 9). Reusing the (pruned) set of tangles for the next search 
call is optional; if search is always called with an empty set of tangles, the 
“forgotten” tangles would be found again. 
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1 def search (Ə, T): 

2 while true : 

3 r- Y -p 

4 while 0 \r#ģ: 

5 oH o\r, -TNn O\r 

6 p — pr(o’), a — pr(©') mod 2 

7 Z,o — TAttr? T ({v € D' | pr(v) = p}) 
8 A <— extract-tangles(Z, a) 

9 if Ste A: Er(t) =: return TUY, t 
10 r rU (Z. p), Y YUA 
11 T-TUY 


Algorithm 2. The search algorithm which, given a game and a set of tangles, 
returns the updated set of tangles and a tangle that is a dominion. 


4.3 The search Algorithm 


The search algorithm is given in Algorithm 2. The algorithm iteratively com- 
putes a top-down decomposition of © into sets of vertices called regions such 
that each region is a-maximal for the player œ who wins the highest priority 
in the region. Each next region in the remaining subgame Ð’ is obtained by 
taking all vertices with the highest priority p in Dd’ and computing the tangle 
attractor set of these vertices for the player that wins that priority, i.e., player 
a =2 p. As every next region has a lower priority, each region is associated with 
a unique priority p. We record the current region of each vertex in an auxiliary 
partial function r: V — {0,1,...,d} called the region function. We record the 
new tangles found during each decomposition in the set Y. 

In each iteration of the decomposition, we first obtain the current subgame 
©’ (line 5) and the top priority p in 0’ (line 6). We compute the next region by 
attracting (with tangles) to the vertices of priority p in 0’ (line 7). We use the 
procedure extract-tangles (described below) to obtain new tangles from the 
computed region (line 8). For each new tangle, we check if the set of outgoing 
edges to the full game Er (t) is empty. If E'r(t) is empty, then we have a dominion 
and we terminate the procedure (line 9). If no dominions are found, then we add 
the new tangles to Y and update r (line 10). After fully decomposing the game 
into regions, we add all new tangles to T (line 11) and restart the procedure. 


4.4 Extracting Tangles from a Region 


To search for tangles in a given region A of player œ with strategy ø, we first 
remove all vertices where player @ can play to lower regions (in 0’) while player 
qa is constrained to ø, i.e., 


vZ.AN ({u € Va| E'(v) C Z}U {v € Va | o(v) € Z}) 


This procedure can be implemented efficiently with a backward search, start- 
ing from all vertices of priority p that escape to lower regions. Since there can 
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be multiple vertices of priority p, the reduced region may consist of multiple 
unconnected tangles. We compute all nontrivial bottom SCCs of the reduced 
region, restricted by the strategy o. Every such SCC is a unique p-tangle. 


4.5 Tangle Learning Solves Parity Games 


We now prove properties of the proposed algorithm. 


Lemma 1. All regions recorded in r in Algorithm 2 are a-mazimal in their 
subgame. 


Proof. This is vacuously true at the beginning of the search. Every region Z is 
a-maximal as Z is computed with TAttr (line 7). Therefore the lemma remains 
true when r is updated at line 10. New tangles are only added to T at line 11, 
after which r is reset to f. 


Lemma 2. All plays consistent with o that stay in a region are won by player 
a. 


Proof. Based on how the attractor computes the region, we show that all cycles 
(consistent with o) in the region are won by player a. Initially, Z only contains 
vertices with priority p; therefore, any cycles in Z are won by player œ. We 
consider two cases: (a) When attracting a single vertex v, any new cycles must 
involve vertices with priority p from the initial set A, since all other a-vertices in 
Z already have a strategy in Z and all other @-vertices in Z have only successors 
in Z, otherwise they would not be attracted to Z. Since p is the highest priority 
in the region, every new cycle is won by player a. (b) When attracting vertices of 
a tangle, we set the strategy for all attracted vertices of player a to the witness 
strategy of the tangle. Any new cycles either involve vertices with priority p (as 
above) or are cycles inside the tangle that are won by player a. 


Lemma 3. Player @ can reach a vertex with the highest priority p from every 
vertex in the region, via a path in the region that is consistent with strategy o. 


Proof. The proof is based on how the attractor computes the region. This prop- 
erty is trivially true for the initial set of vertices with priority p. We consider 
again two cases: (a) When attracting a single vertex v, v is either an a-vertex 
with a strategy to play to Z, or an @-vertex whose successors are all in Z. 
Therefore, the property holds for v. (b) Tangles are strongly connected w.r.t. 
their witness strategy. Therefore player @ can reach every vertex of the tangle 
and since the tangle is attracted to Z, at least one @vertex can play to Z. 
Therefore, the property holds for all attracted vertices of a tangle. 


Lemma 4. For each new tangle t, all successors of t are in higher a-regions. 


Proof. For every bottom SCC B (computed in extract-tangles), E’(v) C B 
for all @-vertices v € B, otherwise player @ could leave B and B would not be a 
bottom SCC. Recall that E’(v) is restricted to edges in the subgame 0’ = Ə \r. 


Attracting Tangles to Solve Parity Games 205 


Therefore E(v) C dom(r) U B in the full game for all @-vertices v € B. Recall 
that E(t) for a tangle t refers to all successors for player @ that leave the tangle. 
Hence, E(t) C dom(r) for every tangle t := B. Due to Lemma 1, no @-vertex 
in B can escape to a higher @region. Thus Er(t) only contains vertices from 
higher a-regions when the new tangle is found by extract-tangles. 


Lemma 5. Every nontrivial bottom SCC B of the reduced region restricted by 
witness strategy o is a unique p-tangle. 


Proof. All a-vertices v in B have a strategy o(v) € B, since B is a bottom SCC 
when restricted by o. B is strongly connected by definition. Per Lemma 2, player 
a wins all plays consistent with ø in the region and therefore also in B. Thus, 
B is a tangle. Per Lemma 3, player @ can always reach a vertex of priority p, 
therefore any bottom SCC must include a vertex of priority p. Since p is the 
highest priority in the subgame, B is a p-tangle. Furthermore, the tangle must 
be unique. If the tangle was found before, then per Lemmas 1 and 4, it would 
have been attracted to a higher a-region. 


Lemma 6. The lowest region in the decomposition always contains a tangle. 


Proof. The lowest region is always nonempty after reduction in extract- 
tangles, as there are no lower regions. Furthermore, this region contains non- 
trivial bottom SCCs, since every vertex must have a successor in the region due 
to Lemma 1. 


Lemma 7. A tangle t is a dominion if and only if Er(t) = 


Proof. If the tangle is a dominion, then player @ cannot leave it, therefore 
Er(t) = 0. If Er(t) = 0, then player @ cannot leave the tangle and since 
all plays consistent with o in the tangle are won by player a, the tangle is a 
dominion. 


Lemma 8. Er(t) = @ for every tangle t found in the highest region of player a. 


Proof. Per Lemma 4, Er(t) C {v € dom(r) | r(v) =2 p} when the tangle is found. 
There are no higher regions of player a, therefore E(t) = 0. 


Lemma 9. The search algorithm terminates by finding a dominion. 


Proof. There is always a highest region of one of the players that is not empty. 
If a tangle is found in this region, then it is a dominion (Lemmas 7 and 8) and 
Algorithm 2 terminates (line 9). If no tangle is found in this region, then the 
opponent can escape to a lower region. Thus, if no dominion is found in a highest 
region, then there is a lower region that contains a tangle (Lemma 6) that must 
be unique (Lemma 5). As there are only finitely many unique tangles, eventually 
a dominion must be found. 


Lemma 10. The solve algorithm solves parity games. 
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Proof. Every invocation of search returns a dominion of the game (Lemma 9). 
The a-attractor of a dominion won by player a is also a dominion of player a. 
Thus all vertices in D are won by player a. The winning strategy is derived as 
the witness strategy of d with the strategy obtained by the attractor at line 6. 
At the end of solve every vertex of the game is either in Wo or Wp. 


4.6 Variations of Tangle Learning 


We propose three different variations of tangle learning that can be combined. 

The first variation is alternating tangle learning, where players take turns to 
maximally learn tangles, i.e., in a turn of player Even, we only search for tangles 
in regions of player Even, until no more tangles are found. Then we search only 
for tangles in regions of player Odd, until no more tangles are found. When 
changing players, the last decomposition can be reused. 

The second variation is on-the-fly tangle learning, where new tangles immedi- 
ately refine the decomposition. When new tangles are found, the decomposition 
procedure is reset to the highest region that attracts one of the new tangles, 
such that all regions in the top-down decomposition remain a-maximal. This is 
the region with priority p := max{min{r(v) | v € Er(t)} |t € A}. 

A third variation skips the reduction step in extract-tangles and only 
extracts tangles from regions where none of the vertices of priority p can escape 
to lower regions. This still terminates finding a dominion, as Lemma 6 still 
applies, i.e., we always extract tangles from the lowest region. Similar variations 
are also conceivable, such as only learning tangles from the lowest region. 


5 Complexity 


We establish a relation between the time complexity of tangle learning and the 
number of learned tangles. 


Lemma 11. Computing the top-down a-mazimal decomposition of a parity 
game runs in time O(dm + dn|T|) given a parity game with d priorities, n 
vertices and m edges, and a set of tangles T. 


Proof. The attractor Attr? runs in time O(n + m), if we record the number of 
remaining outgoing edges for each vertex [23]. The attractor TAtir2? runs in 
time O(n + m + |T| + n|T|), if implemented in a similar style. As m > n, we 
simplify to O(m + n|T|). Since the decomposition computes at most d regions, 
the decomposition runs in time O(dm + dn|T)). 


Lemma 12. Searching for tangles in the decomposition runs in time O(dm). 


Proof. The extract-tangles procedure consists of a backward search, which 
runs in O(n + m), and an SCC search based on Tarjan’s algorithm, which also 
runs in O(n+m). This procedure is performed at most d times (for each region). 
As m > n, we simplify to O(dm). 
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Lemma 13. Tangle learning runs in time O(dnm|T| + dn?|T|?) for a parity 
game with d priorities, n vertices, m edges, and |T| learned tangles. 


Proof. Given Lemmas 11 and 12, each iteration in search runs in time O(dm+ 
dn|T|). The number of iterations is at most |T|, since we learn at least 1 tangle 
per iteration. Then search runs in time O(dm|T| + dn|T|*). Since each found 
dominion is then removed from the game, there are at most n calls to search. 
Thus tangle learning runs in time O(dnm|T| + dn?|T|?). 


b a e f 


OO? i 
GD D 


d h g 


Fig. 2. A parity game that requires several turns to find a dominion. 


The complexity of tangle learning follows from the number of tangles that 
are learned before each dominion is found. Often not all tangles in a game need 
to be learned to solve the game, only certain tangles. Whether this number can 
be exponential in the worst case is an open question. These tangles often serve 
to remove distractions that prevent the other player from finding better tangles. 
This concept is illustrated by the example in Fig. 2, which requires multiple turns 
before a dominion is found. The game contains 4 tangles: {c}, {g} (a dominion), 
{a,b,c,d} and {a,e}. The vertices {e,f,g,h} do not form a tangle, since the 
opponent wins the loop of vertex g. The tangle {a, b,c,d} is a dominion in the 
remaining game after Attr? ({g}) has been removed. 

The tangle {g} is not found at first, as player Odd is distracted by h, i.e., 
prefers to play from g to h. Thus vertex h must first be attracted by the oppo- 
nent. This occurs when player Even learns the tangle {a,e}, which is then 
attracted to f, which then attracts h. However, the tangle {a,e} is blocked, 
as player Even is distracted by b. Vertex b is attracted by player Odd when 
they learn the tangle {c}, which is attracted to d, which then attracts b. So 
player Odd must learn tangle {c} so player Even can learn tangle {a,e}, which 
player Even must learn so player Odd can learn tangle {g} and win the dominion 
{e,f,g,h}, after which player Odd also learns {a,b,c,d} and wins the entire 
game. 

One can also understand the algorithm as the players learning that their 
opponent can now play from some vertex v via the learned tangle to a higher 
vertex w that is won by the opponent. In the example, we first learn that b 
actually leads to d via the learned tangle {c}. Now b is no longer safe for player 
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Even. However, player Even can now play from both d and h via the learned 
0-tangle {a,e} to f, so d and h are no longer interesting for player Odd and 
vertex b is again safe for player Even. 


6 Implementation 


We implement four variations of tangle learning in the parity game solver 
Oink [7]. Oink is a modern implementation of parity game algorithms writ- 
ten in C++. Oink implements priority promotion [3], Zielonka’s recursive algo- 
rithm [25], strategy improvement [11], small progress measures [17], and quasi- 
polynomial time progress measures [12]. Oink also implements self-loop solving 
and winner-controlled winning cycle detection, as proposed in [23]. The imple- 
mentation is publicly available via https://www.github.com/trolando/oink. 

We implement the following variations of tangle learning: standard tan- 
gle learning (t1), alternating tangle learning (at1), on-the-fly tangle learning 
(otft1) and on-the-fly alternating tangle learning (otfat1). The implementa- 
tion mainly differs from the presented algorithm in the following ways. We com- 
bine the solve and search algorithms in one loop. We remember the highest 
region that attracts a new tangle and reset the decomposition to that region 
instead of recomputing the full decomposition. In extract-tangles, we do not 
compute bottom SCCs for the highest region of a player, instead we return the 
entire reduced region as a single dominion (see also Lemma 8). 


7 Empirical Evaluation 


The goal of the empirical evaluation is to study tangle learning and its variations 
on real-world examples and random games. Due to space limitations, we do not 
report in detail on crafted benchmark families (generated by PGSolver [13]), 
except that none of these games is difficult in runtime or number of tangles. 
We use the parity game benchmarks from model checking and equivalence 
checking proposed by Keiren [19] that are publicly available online. These are 313 
model checking and 216 equivalence checking games. We also consider random 
games, in part because the literature on parity games tends to favor studying the 
behavior of algorithms on random games. We include two classes of self-loop-free 
random games generated by PGSolver [13] with a fixed number of vertices: 


— fully random games (randomgame N N 1 N x) 
N € {1000, 2000, 4000, 7000} 
— large low out-degree random games (randomgame N N 1 2 x) 
N € {10000, 20000, 40000, 70000, 100000, 200000, 400000, 700000, 1000000} 


We generate 20 games for each parameter N, in total 80 fully random games 
and 180 low out-degree games. All random games have N vertices and up to 
N distinct priorities. We include low out-degree games, since algorithms may 
behave differently on games where all vertices have few available moves, as also 
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suggested in [3]. In fact, as we see in the evaluation, fully random games appear 
trivial to solve, whereas games with few moves per vertex are more challenging. 
Furthermore, the fully random games have fewer vertices but require more disk 
space (40 MB per compressed file for N = 7000) than large low out-degree games 
(11 MB per compressed file for N = 1000000). 

We compare four variations of tangle learning to the implementations of 
Zielonka’s recursive algorithm (optimized version of Oink) and of priority pro- 
motion (implemented in Oink by the authors of [3]). The motivation for this 
choice is that [7] shows that these are the fastest parity game solving algorithms. 

In the following, we also use cactus plots to compare the algorithms. Cac- 
tus plots show that an algorithm solved X input games within Y seconds 
individually. 


Table 1. Runtimes in sec. and number of timeouts (20min) of the solvers Zielonka 
(z1k), priority promotion (pp), and tangle learning (tl, atl, otft1l, otfat1). 


Solver | MC&EC | Random | Random (large) 
Time Time Time | Timeouts 
pp 503 21 12770| 6 
zlk 576 21 23119 | 13 
otfatl | 808 21 2281 0 
atl 817 21 2404 0 
otftl | 825 21 2238 | 0 
tl 825 21 2312 0 


All experimental scripts and log files are available online via https://www. 
github.com/trolando/tl-experiments. The experiments were performed on a clus- 
ter of Dell PowerEdge M610 servers with two Xeon E5520 processors and 24 GB 
internal memory each. The tools were compiled with gcc 5.4.0. 


7.1 Overall Results 


Table 1 shows the cumulative runtimes of the six algorithms. For the runs that 
timed out, we simply used the timeout value of 1200s, but this underestimates 
the actual runtime. 


7.2 Model Checking and Equivalence Checking Games 


See Fig. 3 for the cactus plot of the six solvers on model checking and equivalence 
checking games. This graph suggests that for most games, tangle learning is only 
slightly slower than the other algorithms. The tangle learning algorithms require 
at most 2x as much time for 12 of the 529 games. There is no significant difference 
between the four variations of tangle learning. 
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Fig. 3. Cactus plots of the solvers Zielonka (z1k), priority promotion (pp) and tangle 
learning (tl, atl, otftl, otfat1). The plot shows how many MC&EC games (top) or 
large random games (bottom) are (individually) solved within the given time. 


The 529 games have on average 1.86 million vertices and 5.85 million edges, 
and at most 40.6 million vertices and 167.5 million edges. All equivalence check- 
ing games have 2 priorities, so every tangle is a dominion. The model checking 
games have 2 to 4 priorities. Tangle learning learns non-dominion tangles for 
only 30 games, and more than 1 tangle only for the 22 games that check the 
infinitely_often_read_write property. The most extreme case is 1,572,864 
tangles for a game with 19,550,209 vertices. These are all 0-tangles that are then 
attracted to become part of 2-dominions. 

That priority promotion and Zielonka’s algorithm perform well is no surprise. 
See also Sect. 8.4. Solving these parity games requires few iterations for all algo- 
rithms, but tangle learning spends more time learning and attracting individual 
tangles, which the other algorithms do not do. Zielonka requires at most 27 
iterations, priority promotion at most 28 queries and 9 promotions. Alternating 
tangle learning requires at most 2 turns. We conclude that these games are not 
complex and that their difficulty is related to their sheer size. 


7.3 Random Games 


Table 1 shows no differences between the algorithms for the fully random games. 
Tangle learning learns no tangles except dominions for any of these games. This 
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agrees with the intuition that the vast number of edges in these games lets 
attractor-based algorithms quickly attract large portions of the game. 

See Fig. 3 for a cactus plot of the solvers on the larger random games. Only 
167 games were solved within 20 min each by Zielonka’s algorithm and only 174 
games by priority promotion. See Table 2 for details of the slowest 10 random 
games for alternating tangle learning. There is a clear correlation between the 
runtime, the number of tangles and the number of turns. One game is particularly 
interesting, as it requires significantly more time than the other games. 

The presence of one game that is much more difficult is a feature of using 
random games. It is likely that if we generated a new set of random games, we 
would obtain different results. This could be ameliorated by experimenting on 
thousands of random games and even then it is still a game of chance whether 
some of these random games are significantly more difficult than the others. 


Table 2. The 10 hardest random games for the atl algorithm, with time in seconds 
and size in number of vertices. 


Time |543 |148 |121 |118/110 |83 |81 73 |68 | 52 
Tangles | 4,018 | 1,219 | 737 | 560/939 | 337 | 493 | 309 | 229 | 384 
Turns |91 56 23 25 |30 12 (18 10 |10 |18 
Size 1M |1M_ |700K|1M|700K)1M 1M 1M|1M|1M 


8 Tangles in Other Algorithms 


We argue that tangles play a fundamental role in various other parity game 
solving algorithms. We refer to [7] for descriptions of these algorithms. 


8.1 Small Progress Measures 


The small progress measures algorithm [17] iteratively performs local updates 
to vertices until a fixed point is reached. Each vertex is equipped with some 
measure that records a statistic of the best game either player knows that they 
can play from that vertex so far. By updating measures based on the successors, 
they essentially play the game backwards. When they can no longer perform 
updates, the final measures indicate the winning player of each vertex. 

The measures in small progress measures record how often each even priority 
is encountered along the most optimal play (so far) until a higher priority is 
encountered. As argued in [7,14], player Even tries to visit vertices with even 
priorities as often as possible, while prioritizing plays with more higher even 
priorities. This often resets progress for lower priorities. Player Odd has the 
opposite goal, i.e., player Odd prefers to play to a lower even priority to avoid 
a higher even priority, even if the lower priority is visited infinitely often. When 
the measures record a play from some vertex that visits more vertices with some 
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even priority than exist in the game, this indicates that player Even can force 
player Odd into a cycle, unless they concede and play to a higher even priority. A 
mechanism called cap-and-carryover [7] ensures via slowly rising measures that 
the opponent is forced to play to a higher even priority. 

We argue that when small progress measures finds cycles of some priority p, 
this is due to the presence of a p-tangle, namely precisely those vertices whose 
measures increase beyond the number of vertices with priority p, since these 
measures can only increase so far in the presence of cycles out of which the 
opponent cannot escape except by playing to vertices with a higher even priority. 

One can now understand small progress measures as follows. The algorithm 
indirectly searches for tangles of player Even, and then searches for the best 
escape for player Odd by playing to the lowest higher even priority. If no such 
escape exists for a tangle, then the measures eventually rise to T, indicating that 
player Even has a dominion. Whereas tangle learning is affected by distractions, 
small progress measures is driven by the dual notion of aversions, i.e., high even 
vertices that player Odd initially tries to avoid. The small progress measures 
algorithm tends to find tangles repeatedly, especially when they are nested. 


8.2 Quasi-polynomial Time Progress Measures 


The quasi-polynomial time progress measures algorithm [12] is similar to small 
progress measures. This algorithm records the number of dominating even ver- 
tices along a play, i.e., such that every two such vertices are higher than all 
intermediate vertices. For example, in the path 1213142321563212, all vertices 
are dominated by each pair of underlined vertices of even priority. Higher even 
vertices are preferred, even if this (partially) resets progress on lower priorities. 

Tangles play a similar role as with small progress measures. The presence of 
a tangle lets the value iteration procedure increase the measure up to the point 
where the other player “escapes” the tangle via a vertex outside of the tangle. 
This algorithm has a similar weakness to nested tangles, but it is less severe 
as progress on lower priorities is often retained. In fact, the lower bound game 
in [12], for which the quasi-polynomial time algorithm is slow, is precisely based 
on nested tangles and is easily solved by tangle learning. 


8.3 Strategy Improvement 


As argued by Fearnley [10], tangles play a fundamental role in the behavior of 
strategy improvement. Fearnley writes that instead of viewing strategy improve- 
ment as a process that tries to increase valuations, one can view it as a process 
that tries to force “consistency with snares” [10, Sect. 6], i.e., as a process that 
searches for escapes from tangles. 


8.4 Priority Promotion 


Priority promotion [3,5] computes a top-down a-maximal decomposition and 
identifies “closed a-regions”, i.e., regions where the losing player cannot escape to 
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lower regions. A closed a-region is essentially a collection of possibly unconnected 
tangles and vertices that are attracted to these tangles. Priority promotion then 
promotes the closed region to the lowest higher region that the losing player can 
play to, i.e., the lowest region that would attract one of the tangles in the region. 
Promoting is different from attracting, as tangles in a region can be promoted 
to a priority that they are not attracted to. Furthermore, priority promotion has 
no mechanism to remember tangles, so the same tangle can be discovered many 
times. This is somewhat ameliorated in extensions such as region recovery [2] and 
delayed promotion [1], which aim to decrease how often regions are recomputed. 

Priority promotion has a good practical performance for games where com- 
puting and attracting individual tangles is not necessary, e.g., when tangles are 
only attracted once and all tangles in a closed region are attracted to the same 
higher region, as is the case with the benchmark games of [19]. 


8.5 Zielonka’s Recursive Algorithm 


Zielonka’s recursive algorithm [25] also computes a top-down a-maximal decom- 
position, but instead of attracting from lower regions to higher regions, the algo- 
rithm attracts from higher regions to tangles in the subgame. Essentially, the 
algorithm starts with the tangles in the lowest region and attracts from higher 
regions to these tangles. When vertices from a higher a-region are attracted to 
tangles of player @, progress for player a is reset. Zielonka’s algorithm also has 
no mechanism to store tangles and games that are exponential for Zielonka’s 
algorithm, such as in [4], are trivially solved by tangle learning. 


9 Conclusions 


We introduced the notion of a tangle as a subgraph of the game where one 
player knows how to win all cycles. We showed how tangles and nested tangles 
play a fundamental role in various parity game algorithms. These algorithms 
are not explicitly aware of tangles and can thus repeatedly explore the same 
tangles. We proposed a novel algorithm called tangle learning, which identifies 
tangles in a parity game and then uses these tangles to attract sets of vertices 
at once. The key insight is that tangles can be used with the attractor to form 
bigger (nested) tangles and, eventually, dominions. We evaluated tangle learning 
in a comparison with priority promotion and Zielonka’s recursive algorithm and 
showed that tangle learning is competitive for model checking and equivalence 
checking games, and outperforms other solvers for large random games. 

We repeat Fearnley’s assertion [10] that “a thorough and complete under- 
standing of how snares arise in a game is a necessary condition for devising a 
polynomial time algorithm for these games”. Fearnley also formulated the chal- 
lenge to give a clear formulation of how the structure of tangles in a given game 
affects the difficulty of solving it. We propose that a difficult game for tangle 
learning must be one that causes alternating tangle learning to have many turns 
before a dominion is found. 
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Abstract. We propose -complete decision procedures for solving sat- 
isfiability of nonlinear SMT problems over real numbers that con- 
tain universal quantification and a wide range of nonlinear functions. 
The methods combine interval constraint propagation, counterexample- 
guided synthesis, and numerical optimization. In particular, we show 
how to handle the interleaving of numerical and symbolic computation 
to ensure delta-completeness in quantified reasoning. We demonstrate 
that the proposed algorithms can handle various challenging global opti- 
mization and control synthesis problems that are beyond the reach of 
existing solvers. 


1 Introduction 


Much progress has been made in the framework of delta-decision procedures for 
solving nonlinear Satisfiability Modulo Theories (SMT) problems over real num- 
bers [1,2]. Delta-decision procedures allow one-sided bounded numerical errors, 
which is a practically useful relaxation that significantly reduces the computa- 
tional complexity of the problems. With such relaxation, SMT problems with 
hundreds of variables and highly nonlinear constraints (such as differential equa- 
tions) have been solved in practical applications [3]. Existing work in this direc- 
tion has focused on satisfiability of quantifier-free SMT problems. Going one 
level up, SMT problems with both free and universally quantified variables, 
which correspond to 4V-formulas over the reals, are much more expressive. 
For instance, such formulas can encode the search for robust control laws in 
highly nonlinear dynamical systems, a central problem in robotics. Non-convex, 
multi-objective, and disjunctive optimization problems can all be encoded as 
Jv-formulas, through the natural definition of “finding some x such that for all 
other x’, x is better than x’ with respect to certain constraints.” Many other 
examples from various areas are listed in [4]. 

Counterexample-Guided Inductive Synthesis (CEGIS) [5] is a framework for 
program synthesis that can be applied to solve generic exists-forall problems. The 


© The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 219-235, 2018. 
https: //doi.org/10.1007/978-3-319-96142-2_15 


220 S. Kong et al. 


idea is to break the process of solving SV-formulas into a loop between synthe- 
sis and verification. The synthesis procedure finds solutions to the existentially 
quantified variables and gives the solutions to the verifier to see if they can be 
validated, or falsified by cowntereramples. The counterexamples are then used 
as learned constraints for the synthesis procedure to find new solutions. This 
method has been shown effective for many challenging problems, frequently gen- 
erating more optimized programs than the best manual implementations [5]. 

A direct application of CEGIS to decision problems over real numbers, how- 
ever, suffers from several problems. CEGIS is complete in finite domains because 
it can explicitly enumerate solutions, which can not be done in continuous 
domains. Also, CEGIS ensures progress by avoiding duplication of solutions, 
while due to numerical sensitivity, precise control over real numbers is difficult. 
In this paper we propose methods that bypass such difficulties. 

We propose an integration of the CEGIS method in the branch-and-prune 
framework as a generic algorithm for solving nonlinear 4V-formulas over real 
numbers and prove that the algorithm is é-complete. We achieve this goal by 
using CEGIS-based methods for turning universally-quantified constraints into 
pruning operators, which is then used in the branch-and-prune framework for the 
search for solutions on the existentially-quantified variables. In doing so, we take 
special care to ensure correct handling of numerical errors in the computation, 
so that 6-completeness can be established for the whole procedure. 

The paper is organized as follows. We first review the background, and then 
present the details of the main algorithm in Sect.3. We then give a rigorous 
proof of the -completeness of the procedure in Sect.4. We demonstrated the 
effectiveness of the procedures on various global optimization and Lyapunov 
function synthesis problems in Sect. 5. 


Related Work. Quantified formulas in real arithmetic can be solved using sym- 
bolic quantifier elimination (using cylindrical decomposition [6]), which is known 
to have impractically high complexity (double exponential [7]), and can not han- 
dle problems with transcendental functions. State-of-the-art SMT solvers such 
as CVC4 [8] and Z3 [9] provide quantifier support [10-13] but they are lim- 
ited to decidable fragments of first-order logic. Optimization Modulo Theories 
(OMT) is a new field that focuses on solving a restricted form of quantified 
reasoning [14-16], focusing on linear formulas. Generic approaches to solving 
exists-forall problems such as [17] are generally based on CEGIS framework, 
and not intended to achieve completeness. Solving quantified constraints has 
been explored in the constraint solving community [18]. In general, existing work 
has not proposed algorithms that intend to achieve any notion of completeness 
for quantified problems in nonlinear theories over the reals. 


2 Preliminaries 


2.1 Delta-Decisions and CNFY-Formulas 


We consider first-order formulas over real numbers that can contain arbitrary 
nonlinear functions that can be numerically approximated, such as polynomials, 
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exponential, and trignometric functions. Theoretically, such functions are called 
Type-2 computable functions [19]. We write this language as Cp,, formally 
defined as: 


Definition 1 (The Lr, Language). Let F be the set of Type-2 computable 
functions. We define Lr, to be the following first-order language: 


t:=a| f(t), where f € F, possibly 0-ary (constant); 
p := t(x) >0| ta) > 0| pAg | eve | say | Vay. 


Remark 1. Negations are not needed as part of the base syntax, as it can be 
defined through arithmetic: =(t > 0) is simply —t > 0. Similarly, an equality 
t = 0 is just t > 0A -—t > 0. In this way we can put the formulas in normal forms 
that are easy to manipulate. 


We will focus on the 4V-formulas in Lr, in this paper. Decision problems for 
such formulas are equivalent to satisfiability of SMT with universally quantified 
variables, whose free variables are implicitly existentially quantified. 

It is clear that, when the quantifier-free part of an 3V formula is in Conjunc- 
tive Normal Form (CNF), we can always push the universal quantifiers inside 
each conjunct, since universal quantification commute with conjunctions. Thus 
the decision problem for any 4V-formula is equivalent to the satisfiability of 
formulas in the following normal form: 


Definition 2 (CNFY Formulas in Lz,). We say an Lp,-formula ¢ is in the 
CNF”, if it is of the form 


p(z) SA 


i=0 


/ cja) (1) 


t£ 


where cij are atomic constraints. Each universally quantified conjunct of the 


formula, i.e., 
ki 
vV calea) 


is called as a V-clause. Note that Siue only contain disjunctions and no 
nested conjunctions. If all the V-clauses are vacuous, we say p(x) is a ground 
SMT formula. 


The algorithms described in this paper will assume that an input formula is in 
CNF” form. We can now define the 6-satisfiability problems for CNF’-formulas. 


Definition 3 (Delta-Weakening/Strengthening). Let ô € QT be arbitrary. 
Consider an arbitrary CNF” -formula of the form 


m 


ola) = J (eV eded) 


i=0 j=0 
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where o € {>,>}. We define the -weakening of p(x) to be: 


m ki 
e (a) = N (YV Ful y) 2 —0)). 
i=0 j=0 


Namely, we weaken the right-hand sides of all atomic formulas from 0 to —ô. 
Note how the difference between strict and nonstrict inequality becomes unim- 
portant in the d-weakening. We also define its dual, the 6-strengthening of y(x): 


= 


m 


o*a) = A (vy (V fules) > +8). 


i=0 j=0 


Since the formulas in the normal form no longer contain negations, the relaxation 
on the atomic formulas is implied by the original formula (and thus weaker), as 
was easily shown in [1]. 


Proposition 1. For any y and 5 € Qt, y~° is logically weaker, in the sense 
that p > y~° is always true, but not vice versa. 


Example 1. Consider the formula 
Vy f(x,y) = 0. 
It is equivalent to the CNEY-formula 
(Vy(—f(#,y) = 0) A Yy F(z, y) > 0)) 
whose -weakening is of the form 
(Vy(—f(2,y) > —8) A Yyl F(z, y) = —8)) 
which is logically equivalent to 


yC F(z, yl] < ô). 


We see that the weakening of f(x,y) = 0 by ||f(x,y)|| < 6 defines a natural 
relaxation. 


Definition 4 (Delta-Completeness). Let 5 € Q* be arbitrary. We say an 
algorithm is 6-complete for -formulas in Lr,, if for any input CNF” -formula 
p, it always terminates and returns one of the following answers correctly: 


- unsat: y is unsatisfiable. 
- 6-sat: y~° is satisfiable. 


When the two cases overlap, it can return either answer. 
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Algorithm 1. Branch-and-Prune 
1: function SOLVE(f(x) = 0, Bz, ô) 


2: S — {Bz} 

3: while S # ý do 

4 B — S.pop() 

5: Be FixedPoint (AB.B N Prune(B, f(x) = 0), B) 
6: if B' AQ then 

7 if ||f(B^|| > 6 then 

8: {B1, B2} — Branch(B’) 
9: S.push({B1, B2}) 

10: else 

11: return 6-sat 

12: end if 

13: end if 

14: end while 

15: return unsat 


16: end function 


2.2 The Branch-and-Prune Framework 


A practical algorithm that has been shown to be -complete for ground SMT 
formulas is the branch-and-prune method developed for interval constraint prop- 
agation [20]. A description of the algorithm in the simple case of an equality 
constraint is in Algorithm 1. 

The procedure combines pruning and branching operations. Let B be the set 
of all boxes (each variable assigned to an interval), and C a set of constraints in 
the language. FixedPoint(g, B) is a procedure computing a fixedpoint of a function 
g : B — B with an initial input B. A pruning operation Prune : B x C — B takes 
a box B € B and a constraint as input, and returns an ideally smaller box B’ € B 
(Line 5) that is guaranteed to still keep all solutions for all constraints if there is 
any. When such pruning operations do not make progress, the Branch procedure 
picks a variable, divides its interval by halves, and creates two sub-problems Bı 
and Bə (Line 8). The procedure terminates if either all boxes have been pruned 
to be empty (Line 15), or if a small box whose maximum width is smaller than 
a given threshold 6 has been found (Line 11). In [2], it has been proved that 
Algorithm 1 is 6-complete iff the pruning operators satisfy certain conditions for 
being well-defined (Definition 5). 


3 Algorithm 


The core idea of our algorithm for solving CNFY-formulas is as follows. We 
view the universally quantified constraints as a special type of pruning opera- 
tors, which can be used to reduce possible values for the free variables based 
on their consistency with the universally-quantified variables. We then use these 
special V-pruning operators in an overall branch-and-prune framework to solve 
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the full formula in a -complete way. A special technical difficulty for ensuring 
-completeness is to control numerical errors in the recursive search for coun- 
terexamples, which we solve using double-sided error control. We also improve 
quality of counterexamples using local-optimization algorithms in the V-pruning 
operations, which we call locally-optimized countereramples. 

In the following sections we describe these steps in detail. For notational 
simplicity we will omit vector symbols and assume all variable names can directly 
refer to vectors of variables. 


3.1 V-Clauses as Pruning Operators 


Consider an arbitrary CNFY-formula! 


m k; 
p(x) = N (WV fulen) = 0). 
i=0 j=0 


It is a conjunction of V-clauses as defined in Definition 2. Consequently, we 
only need to define pruning operators for V-clauses so that they can be used in 
a standard branch-and-prune framework. The full algorithm for such pruning 
operation is described in Algorithm 2. 


Algorithm 2. V-Clause Pruning 


1: function PRUNE(Bz, By, Vy Vi, fi(x,y) > 0, 5”, €, ô) 

2 repeat 

3 B” — B; 

4 Y- A fi(@,y) <0 

5: wre — Strengthen(w, €) 

6: b — Solve(y, pt®, 5’) > 0 <6’ <e <6 should hold. 
T: if b = Ú then 

8: return Bz >œ No counterexample found, stop pruning. 
9: end if 
10: for i € {0,...,k} do 
11: Bi — BaN Prune( Be, fi(,) > 0) 
12: end for 
13: By — aes Bi 


14: until B, 4 BEY 
15: return B, 
16: end function 


In Algorithm 2, the basic idea is to use special y values that witness the 
negation of the original constraint to prune the box assignment on x. The two 
core steps are as follows. 


1 Note that without loss of generality we only use nonstrict inequality here, since in 
the context of d-decisions the distinction between strict and nonstrict inequalities as 
not important, as explained in Definition 3. 
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1. Counterexample generation (Line 4 to 9). The query for a counterexample y% 
is defined as the negation of the quantifier-free part of the constraint (Line 4). 
The method Solve(y,w,6) means to obtain a solution for the variables y 
6-satisfying the logic formula ~. When such a solution is found, we have 
a counterexample that can falsify the V-clause on some choice of x. Then we 
use this counterexample to prune on the domain of x, which is currently By. 
The strengthening operation on 7 (Line 5), as well as the choices of € and ô’, 
will be explained in the next subsection. 

2. Pruning on x (Line 10 to 13). In the counterexample generation step, we have 
obtained a counterexample b. The pruning operation then uses this value to 
prune on the current box domain By. Here we need to be careful about the 
logical operations. For each constraint, we need to take the intersection of the 
pruned results on the counterexample point (Line 11). Then since the original 
clause contains the disjunction of all constraints, we need to take the box-hull 
(LI) of the pruned results (Line 13). 


We can now put the pruning operators defined for all V-clauses in the overall 
branch-and-prune framework shown in Algorithm 1. 

The pruning algorithms are inspired by the CEGIS loop, but are different 
in multiple ways. First, we never explicitly compute any candidate solution for 
the free variables. Instead, we only prune on their domain boxes. This ensures 
that the size of domain box decreases (together with branching operations), and 
the algorithm terminates. Secondly, we do not explicitly maintain a collection 
of constraints. Each time the pruning operation works on previous box — i.e., 
the learning is done on the model level instead of constraint level. On the other 
hand, being unable to maintain arbitrary Boolean combinations of constraints 
requires us to be more sensitive to the type of Boolean operations needed in the 
pruning results, which is different from the CEGIS approach that treats solvers 
as black boxes. 


3.2 Double-Sided Error Control 


To ensure the correctness of Algorithm 2, it is necessary to avoid spurious coun- 
terexamples which do not satisfy the negation of the quantified part in a V-clause. 
We illustrate this condition by consider a wrong derivation of Algorithm 2 where 
we do not have the strengthening operation on Line 5 and try to find a coun- 
terexample by directly executing b — Solve(y,w = No filz, y) < 0,6). Note 
that the counterexample query 7% can be highly nonlinear in general and not 
included in a decidable fragment. As a result, it must employ a delta-decision 
procedure (i.e. Solve with 6’ € QF) to find a counterexample. A consequence of 
relying on a delta-decision procedure in the counterexample generation step is 
that we may obtain a spurious counterexample b such that for some x = a: 


k k 
\ fi(a,b) <ô instead of VAN fila, b) < 0. 


i=0 i=0 
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Consequently the following pruning operations fail to reduce their input boxes 
because a spurious counterexample does not witness any inconsistencies between 
x and y. As a result, the fixedpoint loop in this V-Clause pruning algorithm 
will be terminated immediately after the first iteration. This makes the outer- 
most branch-and-prune framework (Algorithm 1), which employs this pruning 
algorithm, solely rely on branching operations. It can claim that the problem 
is 6-satisfiable while providing an arbitrary box B as a model which is small 
enough (||B|| < 6) but does not include a 6-solution. 

To avoid spurious counterexamples, we directly strengthen the counterexam- 
ple query with £ € Qt to have 


k 
pre := N fila,d) < —e. 
i=0 


Then we choose a weakening parameter 6’ € Q in solving the strengthened for- 
mula. By analyzing the two possible outcomes of this counterexample search, we 
show the constraints on 6’, €, and 6 which guarantee the correctness of Algo- 
rithm 2: 


— 6'-sat case: We have a and b such that No fila, b) < —e+ 0’. For y =b to 
be a valid counterexample, we need —e + 6’ < 0. That is, we have 


F <E. (2) 


In other words, the strengthening factor e should be greater than the weak- 
ening parameter 6’ in the counterexample search step. 

— unsat case: By checking the absence of counterexamples, it proved that 
Vy a fi(x,y) > —e for all x € B}. Recall that we want to show that 
Vy Ves fi(x,y) > —6 holds for some z = a when Algorithm 1 uses this 
pruning algorithm and returns d-sat. To ensure this property, we need the 
following constraint on £ and 6: 


E< Ô. (3) 


3.3 Locally-Optimized Counterexamples 


The performance of the pruning algorithm for CNFY-formulas depends on the 
quality of the counterexamples found during the search. 

Figure la illustrates this point by visualizing a pruning process for an uncon- 
strained minimization problem, 3x € XoVy € Xo f(x) < f(y). As it finds a series 
of counterexamples CE1, CE2, CE3, and CEg, the pruning algorithms uses those 
counterexamples to contract the interval assignment on X from Xo to X1, Xo, 
X3, and X,4 in sequence. In the search for a counterexample (Line 6 of Algo- 
rithm 2), it solves the strengthened query, f(x) > f(y) +6. Note that the query 
only requires a counterexample y = b to be -away from a candidate x while 
it is clear that the further a counterexample is away from candidates, the more 
effective the pruning algorithm is. 
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(a) Without local optimization. (b) With local optimization. 


Fig. 1. Illustrations of the pruning algorithm for CNF’-formula with and without using 
local optimization. 


Based on this observation, we present a way to improve the performance 
of the pruning algorithm for CNFY-formulas. After we obtain a counterexam- 
ple b, we locally-optimize it with the counterexample query w so that it “fur- 
ther violates” the constraints. Figure 1b illustrates this idea. The algorithm first 
finds a counterexample CE, then refines it to CE} by using a local-optimization 
algorithm (similarly, CEz — CE4). Clearly, this refined counterexample gives a 
stronger pruning power than the original one. This refinement process can also 
help the performance of the algorithm by reducing the number of total iterations 
in the fixedpoint loop. 

The suggested method is based on the assumption that local-optimization 
techniques are cheaper than finding a global counterexample using interval prop- 
agation techniques. In our experiments, we observed that this assumption holds 
practically. We will report the details in Sect. 5. 


4 -Completeness 


We now prove that the proposed algorithm is -complete for arbitrary CNFY 
formulas in £R,. In the work of [2], 6-completeness has been proved for branch- 
and-prune for ground SMT problems, under the assumption that the pruning 
operators are well-defined. Thus, the key for our proof here is to show that the 
V-pruning operators satisfy the conditions of well-definedness. 

The notion of a well-defined pruning operator is defined in [2] as follows. 


Definition 5. Let ¢ be a constraint, and B be the set of all boxes in R”. A 
pruning operator is a function Prune: B x C — B. We say such a pruning 
operator is well-defined, if for any B € B, the following conditions are true: 


1. Prune(B,¢) C B. 

2. BA {a €R” : g(a) is true.} C Prune(B, ¢). 

3. Write Prune(B,ġ) = B’. There exists a constant c € QT, such that, if B' AO 
and ||B'|| < £ for some £ € QF, then for alla € B’, p7% (a) is true. 
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We will explain the intuition behind these requirements in the next proof, which 
aims to establish that Algorithm 2 defines a well-defined pruning operator. 


Lemma 1 (Well-Definedness of V-Pruning). Consider an arbitrary V- 
clause in the generic form 


c(x) := vy (file.y) >0V..V fe (x,y) = 0). 


Suppose the pruning operators for fı > 0,..., fk > 0 are well-defined, then the 
V-pruning operation for c(x) as described in Algorithm 2 is well-defined. 


Proof. We prove that the pruning operator defined by Algorithm 2 satisfies the 
three conditions in Definition 5. Let Bo, ..., By be a sequence of boxes, where Bo 
is the input box B; and By, is the returned box B, which is possibly empty. 

The first condition requires that the pruning operation for c(x) is reductive. 
That is, we want to show that Bẹ C BP*Y holds in Algorithm 2. If it does not 
find a counterexample (Line 8), we have B, = B?™°Y. So the condition holds 
trivially. Consider the case where it finds a counterexample b. The pruned box 
B, is obtained through box-hull of all the B; boxes (Line 13), which are results 
of pruning on BP'°Y using ordinary constraints of the form f;(x,b) > 0 (Line 11), 
for a counterexample b. Following the assumption that the pruning operators are 
well-defined for each ordinary constraint f; used in the algorithm, we know that 
B; C B?" holds as a loop invariant for the loop from Line 10 to Line 12. Thus, 
taking the box-hull of all the B;, we obtain By that is still a subset of BPT°’. 

The second condition requires that the pruning operation does not eliminate 
real solutions. Again, by the assumption that the pruning operation on Line 11 
does not lose any valid assignment on x that makes the V-clause true. In fact, 
since y is universally quantified, any choice of assignment y = b will preserve 
solution on «x as long as the ordinary pruning operator is well-defined. Thus, this 
condition is easily satisfied. 

The third condition is the most nontrivial to establish. It ensures that when 
the pruning operator does not prune a box to the empty set, then the box should 
not be “way off”, and in fact, should contain points that satisfy an appropriate 
relaxation of the constraint. We can say this is a notion of “faithfulness” of 
the pruning operator. For constraints defined by simple continuous functions, 
this can be typically guaranteed by the modulus of continuity of the function 
(Lipschitz constants as a special case). Now, in the case of V-clause pruning, we 
need to prove that the faithfulness of the ordinary pruning operators that are 
used translates to the faithfulness of the V-clause pruning results. First of all, this 
condition would not hold, if we do not have the strengthening operation when 
searching for counterexamples (Line 5). As is shown in Example 1, because of 
the weakening that d-decisions introduce when searching for a counterexample, 
we may obtain a spurious counterexample that does not have pruning power. In 
other words, if we keep using a wrong counterexample that already satisfies the 
condition, then we are not able to rule out wrong assignments on x. Now, since 
we have introduced e-strengthening at the counterexample search, we know that 
b obtained on Line 6 is a true counterexample. Thus, for some x = a, fi(a,b) < 0 
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for every i. By assumption, the ordinary pruning operation using b on Line 11 
guarantees faithfulness. That is, suppose the pruned result B; is not empty and 
||B;|| < £, then there exists constant c; such that f;(a,b) > —cje is true. Thus, 
we can take the c = min; c; as the constant for the pruning operator defined by 
the full clause, and conclude that the disjunction Vio fi(x,y) < —ce holds for 
Boll < € 


Using the lemma, we follow the results in [2], and conclude that the branch-and- 
prune method in Algorithm 1 is delta-complete: 


Theorem 1 (6-Completeness). For any 6 € Qt, using the proposed V- 
pruning operators defined in Algorithm 2 in the branch-and-prune framework 
described in Algorithm 1 is 5-complete for the class of CNF” -formulas in Lr 
assuming that the pruning operators for all the base functions are well-defined. 


Proof. Following Theorem 4.2 (d-Completeness of ICP-) in [2], a branch-and- 
prune algorithm is 6-complete iff the pruning operators in the algorithm are 
all well-defined. Following Lemma 1, Algorithm 2 always defines well-defined 
pruning operators, assuming the pruning operators for the base functions are 
well-defined. Consequently, Algorithms 2 and 1 together define a delta-complete 
decision procedure for CNF’-problems in Lre: 


5 Evaluation 


Implementation. We implemented the algorithms on top of dReal [21], an open- 
source delta-SSMT framework. We used IBEX-lib [22] for interval constraints 
pruning and CLP [23] for linear programming. For local optimization, we used 
NLopt [24]. In particular, we used SLSQP (Sequential Least-Squares Quadratic 
Programming) local-optimization algorithm [25] for differentiable constraints 
and COBYLA (Constrained Optimization BY Linear Approximations) local- 
optimization algorithm [26] for non-differentiable constraints. The prototype 
solver is able to handle 4V-formulas that involve most standard elementary 
functions, including power, exp, log, \/-, trigonometric functions (sin, cos, tan), 
inverse trigonometric functions (arcsin, arccos, arctan), hyperbolic functions 
(sinh, cosh, tanh), etc. 


Experiment environment. All experiments were ran on a 2017 Macbook Pro 
with 2.9 GHz Intel Core i7 and 16 GB RAM running MacOS 10.13.4. All code 
and benchmarks are available at https://github.com/dreal/CAV 18. 


Parameters. In the experiments, we chose the strengthening parameter € = 
0.996 and the weakening parameter in the counterexample search 6’ = 0.980. In 
each call to NLopt, we used le—6 for both of absolute and relative tolerances 
on function value, le-3s for a timeout, and 100 for the maximum number of 
evaluations. These values are used as stopping criteria in NLopt. 
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Table 1. Experimental results for nonlinear global optimization problems: The first 
19 problems (Ackley 2D — Zettl) are unconstrained optimization problems and the last 
five problems (Rosenbrock Cubic — Simionescu) are constrained optimization problems. 
We ran our prototype solver over those instances with and without local-optimization 
option (“L-Opt.” and “No L-Opt.” columns) and compared the results. We chose 6 = 


0.0001 for all instances. 


Name Solution Time (sec) 
Global No L-Opt. | L-Opt. No L-Opt. L-Opt. | Speed up 

Ackley 2D 0.00000 0.00000 0.00000} 0.0579 0.0047} 12.32 
Ackley 4D 0.00000 0.00005 0.00000 | 8.2256 0.1930 | 42.62 
Aluffi Pentini —0.35230, —0.35231| —0.35239| 0.0321 0.1868} 0.17 
Beale 0.00000 0.00003 0.00000} 0.0317 0.0615} 0.52 
Bohachevsky1 0.00000 0.00006 0.00000} 0.0094 0.0020} 4.70 
Booth 0.00000 0.00006 0.00000 | 0.5035 0.0020 | 251.75 
Brent 0.00000 0.00006 0.00000 | 0.0095 0.0017| 5.59 
Bukin6 0.00000 0.00003 0.00003) 0.0093 0.0083} 1.12 
Cross in tray —2.06261) —2.06254| —2.06260| 0.5669 0.1623} 3.49 
Easom —1.00000, —1.00000} —1.00000| 0.0061 0.0030} 2.03 
EggHolder —959.64070 —959.64030 |-959.64031 | 0.0446 0.0211} 2.11 
Holder Table 2 —19.20850 | —19.20846 | —19.20845 | 52.9152 41.7004 1.27 
Levi N13 0.00000 0.00000 0.00000) 0.1383 0.0034} 40.68 
Ripple 1 —2.20000) —2.20000| —2.20000| 0.0059 0.0065} 0.91 
Schaffer F6 0.00000 0.00004 0.00000} 0.0531 0.0056} 9.48 
Testtube holder —10.87230 | —10.87227 | —10.87230 | 0.0636 0.0035} 18.17 
Trefethen —3.30687) —3.30681| —3.30685| 3.0689 1.4916) 2.06 
W Wavy 0.00000 0.00000 0.00000) 0.1234 0.0138} 8.94 
Zett] —0.00379) —0.00375| —0.00379| 0.0070 0.0069} 1.01 
Rosenbrock Cubic 0.00000 0.00005 0.00002 0.0045 0.0036} 1.25 
Rosenbrock Disk 0.00000 0.00002 0.00000 | 0.0036 0.0028} 1.29 
Mishra Bird —106.76454 —106.76449 |-106.76451| 1.8496 0.9122 2.03 
Townsend —2.02399) —2.02385| —2.02390| 2.6216 0.5817} 4.51 
Simionescu —0.07262) —0.07199| —0.07200| 0.0064 0.0048} 1.33 


5.1 Nonlinear Global Optimization 


We encoded a range of highly nonlinear 4V-problems from constrained and 
unconstrained optimization literature [27,28]. Note that the standard optimiza- 


tion problem 


min f(x) s.t. p(x), 2 eR”, 


can be encoded as the logic formula: 


p(x) Ay (ely) > F(a) < f)). 
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(a) Ackley Function. (b) EggHolder Function. 


(c) Holder Table2 Function. (d) Levi N13 Function. 


(e) Ripple 1 Function. (£) Testtube Holder Function. 


Fig. 2. Nonlinear global optimization examples. 


As plotted in Fig.2, these optimization problems are non-trivial: they are 
highly non-convex problems that are designed to test global optimization or 
genetic programming algorithms. Many such functions have a large number of 
local minima. For example, Ripple 1 Function [27]. 


xı—0.1 


2 
2 
fæ, 22) = X —e 208 D( 3") (sin(5r2;) + 0.1 cos?(50072;)) 


defined in z; € [0,1] has 252004 local minima with the global minima 
f(0.1,0.1) = —2.2. As a result, local-optimization algorithms such as gradient- 
descent would not work for these problems for itself. By encoding them as JV- 
problems, we can perform guaranteed global optimization on these problems. 
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Table 1 provides asummary of the experiment results. First, it shows that we 
can find minimum values which are close to the known global solutions. Second, 
it shows that enabling the local-optimization technique speeds up the solving 
process significantly for 20 instances out of 23 instances. 


5.2 Synthesizing Lyapunov Function for Dynamical System 


We show that the proposed algorithm is able to synthesize Lyapunov functions 
for nonlinear dynamic systems described by a set of ODEs: 


a(t) = fi(a(t)), Vat) © X 


Our approach is different from a recent related-work [29] where they used 
dReal only to verify a candidate function which was found by a simulation- 
guided algorithm. In contrast, we want to do both of search and verify steps by 
solving a single 4V-formula. Note that to verify a Lyapunov candidate function 
vu: X — R*, v satisfies the following conditions: 


Va € X \ 0 v(x)(0) = 

Va € X Vo(a(t))” - fi(a(t)) < 

We assume that a Lyapunov function is a polynomial of some fixed degrees 
over x, that is, v(x) = zT Pz where z is a vector of monomials over x and 


P is a symmetric matrix. Then, we can encode this synthesis problem into the 
dVv-formula: 


P [(v(2) = (2"Pz))A 
(Va € X\0 v(x)(0) = O)A 
(Va € X Vu(a(t))" - fi(a(d)) < 0)] 


In the following sections, we show that we can handle two examples in [29]. 
Normalized Pendulum. Given a standard pendulum system with normalized 


parameters 
Ly a T2 
LQ = sin(21) = T3 


and a quadratic template for a Lyapunov function v(æ) = xT Px = cyx,22 + 
c2x7 +c3x2, we can encode this synthesis problem into the following 3V-formula: 


Ww 


c1cec3 Vari x2 [((50c3£1£2 + 50r7c1 + 50x3c2 > 0.5)A 
(100ci 2122 + 50x2c3 + (—x2 — sin(x1)(50x1c3 + 10022c2)) < —0.5))V 
7((0.01 < a} + x3) A (z1 + z3 < 1))] 


Our prototype solver takes 44.184s to synthesize the following function as a 
solution to the problem for the bound ||æ|| € [0.1,1.0] and c; € [0.1, 100] using 
ô = 0.05: 

v = 40.68432 22 + 35.6870x? + 84.3906x2. 
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Damped Mathieu System. Mathieu dynamics are time-varying and defined 
by the following ODEs: 


Ly Z T2 
Le = =p = (2 + sin(t))xı 
Using a quadratic template for a Lyapunov function v(æ) = TPg = 


C1£1£2 + cor? + 322, we can encode this synthesis problem into the following 
dv-formula: 


c1C2¢3 Wari aet [(502122¢2 + 50x£7c1 + 50a3c3 > O)A 


(100c1 4122 + 50x2c2 + (— x2 — T1 (2 4 


Ww 


sin(t)))(5021c2 + 100z2c3) < 0) 
V A((0.01 < z? + z3) A (0.1 < t) A (t < 1) A (x? +25 < 1))| 


Our prototype solver takes 26.533s to synthesize the following function as a 
solution to the problem for the bound ||æ|| € [0.1,1.0], ¢ € [0.1,1.0], and c; € 
[45,98] using ô = 0.05: 


V = 54.6950a1 £2 + 90.284927 + 50.537622. 


6 Conclusion 


We have described delta-decision procedures for solving exists-forall formulas 
in the first-order theory over the reals with computable real functions. These 
formulas can encode a wide range of hard practical problems such as general 
constrained optimization and nonlinear control synthesis. We use a branch-and- 
prune framework, and design special pruning operators for universally-quantified 
constraints such that the procedures can be proved to be delta-complete, where 
suitable control of numerical errors is crucial. We demonstrated the effective- 
ness of the procedures on various global optimization and Lyapunov function 
synthesis problems. 
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Abstract. We present a novel approach for solving quantified bit-vector 
formulas in Satisfiability Modulo Theories (SMT) based on computing 
symbolic inverses of bit-vector operators. We derive conditions that pre- 
cisely characterize when bit-vector constraints are invertible for a rep- 
resentative set of bit-vector operators commonly supported by SMT 
solvers. We utilize syntax-guided synthesis techniques to aid in estab- 
lishing these conditions and verify them independently by using several 
SMT solvers. We show that invertibility conditions can be embedded into 
quantifier instantiations using Hilbert choice expressions, and give exper- 
imental evidence that a counterexample-guided approach for quantifier 
instantiation utilizing these techniques leads to performance improve- 
ments with respect to state-of-the-art solvers for quantified bit-vector 
constraints. 


1 Introduction 


Many applications in hardware and software verification rely on Satisfiability 
Modulo Theories (SMT) solvers for bit-precise reasoning. In recent years, the 
quantifier-free fragment of the theory of fixed-size bit-vectors has received a lot 
of interest, as witnessed by the number of applications that generate problems 
in that fragment and by the high, and increasing, number of solvers that par- 
ticipate in the corresponding division of the annual SMT competition. Modeling 
properties of programs and circuits, e.g., universal safety properties and pro- 
gram invariants, however, often requires the use of quantified bit-vector formulas. 
Despite a multitude of applications, reasoning efficiently about such formulas is 
still a challenge in the automated reasoning community. 

The majority of solvers that support quantified bit-vector logics employ 
instantiation-based techniques [8, 21,22, 25], which aim to find conflicting ground 
instances of quantified formulas. For that, it is crucial to select good instantia- 
tions for the universal variables, or else the solver may be overwhelmed by the 
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number of ground instances generated. For example, consider a quantified for- 
mula Y = Vz. (x + s % t) where x, s and t denote bit-vectors of size 32. To 
prove that w is unsatisfiable we can instantiate x with all 2°? possible bit-vector 
values. However, ideally, we would like to find a proof that requires much fewer 
instantiations. In this example, if we instantiate x with the symbolic term t — s 
(the inverse of x + s ~ t when solved for x), we can immediately conclude that 
W is unsatisfiable since (t — s) + s % t simplifies to false. 

Operators in the theory of bit-vectors are not always invertible. However, we 
observe it is possible to identify quantifier-free conditions that precisely char- 
acterize when they are. We do that for a representative set of operators in the 
standard theory of bit-vectors supported by SMT solvers. For example, we have 
proven that the constraint x-s ~ t is solvable for x if and only if (—s | s) & t ~ t 
is satisfiable. Using this observation, we develop a novel approach for solving 
quantified bit-vector formulas that utilizes invertibility conditions to generate 
symbolic instantiations. We show that invertibility conditions can be embedded 
into quantifier instantiations using Hilbert choice functions in a sound manner. 
This approach has compelling advantages with respect to previous approaches, 
which we demonstrate in our experiments. 

More specifically, this paper makes the following contributions. 


— We derive and present invertibility conditions for a representative set of bit- 
vector operators that allow us to model all bit-vector constraints in SMT- 
LIB [3]. 

— We provide details on how invertibility conditions can be automatically syn- 
thesized using syntax-guided synthesis (SyGuS) [1] techniques, and make pub- 
lic 162 available challenge problems for SyGuS solvers that are encodings of 
this task. 

— We prove that our approach can efficiently reduce a class of quantified for- 
mulas, which we call unit linear invertible, to quantifier-free constraints. 

— Leveraging invertibility conditions, we implement a novel quantifier instan- 
tiation scheme as an extension of the SMT solver CVC4 [2], which shows 
improvements with respect to state-of-the-art solvers for quantified bit-vector 
constraints. 


Related Work. Quantified bit-vector logics are currently supported by the SMT 
solvers Boolector [16], CVC4 [2], Yices [7], and Z3 [6] and a Binary Decision Dia- 
gram (BDD)-based tool called Q3B [14]. Out of these, only CVC4 and Z3 provide 
support for combining quantified bit-vectors with other theories, e.g., the theories 
of arrays or real arithmetic. Arbitrarily nested quantifiers are handled by all but 
Yices, which only supports bit-vector formulas of the form JaVy. Q[x, y] [8]. For 
quantified bit-vectors, CVC4 employs counterexample-guided quantifier instan- 
tiation (CEGQI) [22], where concrete models of a set of ground instances and the 
negation of the input formula (the counterexamples) serve as instantiations for 
the universal variables. In Z3, model-based quantifier instantiation (MBQI) [10] 
is combined with a template-based model finding procedure [25]. In contrast to 
CVC4, Z3 not only relies on concrete counterexamples as candidates for quan- 
tifier instantiation but generalizes these counterexamples to generate symbolic 
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instantiations by selecting ground terms with the same model value. Boolec- 
tor employs a syntax-guided synthesis approach to synthesize interpretations for 
Skolem functions based on a set of ground instances of the formula, and uses a 
counterexample refinement loop similar to MBQI [21]. Other counterexample- 
guided approaches for quantified formulas in SMT solvers have been considered 
by Bjgrner and Janota [4] and by Reynolds et al. [23], but they have mostly 
targeted quantified linear arithmetic and do not specifically address bit-vectors. 
Quantifier elimination for a fragment of bit-vectors that covers modular linear 
arithmetic has been recently addressed by John and Chakraborty [13], although 
we do not explore that direction in this paper. 


2 Preliminaries 


We assume the usual notions and terminology of many-sorted first-order logic 
with equality (denoted by ~). Let S be a set of sort symbols, and for every 
sort go € S let Xo be an infinite set of variables of sort o. We assume that sets 
Xo are pairwise disjoint and define X as the union of sets X,. Let X be a 
signature consisting of a set XC S of sort symbols and a set © of interpreted 
(and sorted) function symbols f7!'"?"? with arity n > 0 and o1,...,0n,0 € X5. 
We assume that a signature X includes a Boolean sort Bool and the Boolean 
constants T (true) and L (false). Let Z be a X -interpretation that maps: each 
o € X° to a non-empty set o? (the domain of T), with Bool” = {T, L}; each 
x € X, to an element x7 € o7; and each | i E Xf to a total function 
ft: of x... x of — o? if n > 0, and to an element in of ifn = 0. If x € Xo 
and v € a7, we denote by Z[x + v] the interpretation that maps x to v and is 
otherwise identical to Z. We use the usual inductive definition of a satisfiability 
relation = between »-interpretations and »/-formulas. 

We assume the usual definition of well-sorted terms, literals, and formulas 
as Bool terms with variables in X and symbols in X, and refer to them as X- 
terms, X-atoms, and so on. A ground term/formula is a X-term/formula without 
variables. We define x = (x1, ..., £n) as a tuple of variables and write Qay with 
Q € {V, 3} for a quantified formula Qx,---Qzny. We use Lit(y) to denote the 
set of X-literals of X-formula y. For a X-term or X-formula e, we denote the 
free variables of e (defined as usual) as F'V(e) and use e[a] to denote that the 
variables in æ occur free in e. For a tuple of X-terms t = (t1,...,tn), we write 
eft] for the term or formula obtained from e by simultaneously replacing each 
occurrence of x; in e by t;. Given a Y-formula y|2] with x € Xs, we use Hilbert’s 
choice operator £ [12] to describe properties of x. We define a choice function 
ex. p[x] as a term where x is bound by e. In every interpretation Z, ex. [x] 
denotes some value v € o7 such that Z[z — v] satisfies [zx] if such values exist, 
and denotes an arbitrary element of a? otherwise. This means that the formula 
da. y[z] = plex. y[a]] is satisfied by every interpretation. 

A theory T is a pair (X, I), where X is a signature and J is a non-empty class 
of X/-interpretations (the models of T) that is closed under variable reassignment, 
i.e., every X/-interpretation that only differs from an Z € J in how it interprets 
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Table 1. Set of considered bit-vector operators with corresponding SMT-LIB 2 syntax. 


Symbol SMT-LIB syntax Sort 
X, <u; >u, Ls, >s | =, bvult, bvugt, bvslt, bvsgt Fin] X Tin] > Bool 
~,- bvnot, bvneg Oin] > Fn] 


&, |, <, >>, >a |bvand, bvor, bvshl, bvlshr, bvashr | oin] X ofin] > Fin] 


+,-, mod, + bvadd, bvmul, bvurem, bvudiv Fin] X Tin] > Tin] 
o concat Oin] X Tim] > Tin+m] 
[lu : 1] extract Oin) > Ffu-141), 0S 1S u<n 


variables is also in J. A X-formula y is T-satisfiable (resp. T-unsatisfiable) if it 
is satisfied by some (resp. no) interpretation in J; it is T-valid if it is satisfied by 
all interpretations in I. A choice function ex. |x] is (T-)valid if 3x. [x] is (T-) 
valid. We refer to a term t as € -(T-)valid if all occurrences of choice functions in 
t are (T-)valid. We will sometimes omit T when the theory is understood from 
context. 

We will focus on the theory Tgy = (Xpy, Ipv) of fixed-size bit-vectors as 
defined by the SMT-LIB 2 standard [3]. The signature Xgy includes a unique 
sort for each positive bit-vector width n, denoted here as ojn]. Similarly, Xin] is 
the set of bit-vector variables of sort cjn], and X py is the union of all sets Xin]. 
We assume that X'gy includes all bit-vector constants of sort of) for each n, 
represented as bit-strings. However, to simplify the notation we will sometimes 
denote them by the corresponding natural number in {0,...,2"~1}. All inter- 
pretations Z € Igy are identical except for the value they assign to variables. 
They interpret sort and function symbols as specified in SMT-LIB 2. All function 
symbols in we are overloaded for every ojn] € Xy. We denote a Xgy-term 
(or bit-vector term) t of width n as ti] when we want to specify its bit-width 
explicitly. We use maxgj,) Or Minsjn] for the maximum or minimum signed value 
of width n, e.g., maxsj4) = 0111 and ming) = 1000. The width of a bit-vector 
sort or term is given by the function k, e.g., K(o[n)) =n and K(tin]) =n. 

Without loss of generality, we consider a restricted set of bit-vector function 
symbols (or bit-vector operators) De as listed in Table 1. The selection of oper- 
ators in this set is arbitrary but complete in the sense that it suffices to express 
all bit-vector operators defined in SMT-LIB 2. 


3 Invertibility Conditions for Bit-Vector Constraints 


This section formally introduces the concept of an invertibility condition and 
shows that such conditions can be used to construct symbolic solutions for a 
class of quantifier-free bit-vector constraints that have a linear shape. 

Consider a bit-vector literal x + s ~ t and assume that we want to solve for 
x. If the literal is linear in x, that is, has only one occurrence of x, a general 
solution for x is given by the inverse of bit-vector addition over equality: x = t—s. 
Computing the inverse of a bit-vector operation, however, is not always possible. 
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For example, for x -s ~ t, an inverse always exists only if s always evaluates 
to an odd bit-vector. Otherwise, there are values for s and t where no such 
inverse exists, e.g., 2-2 ~ 3. However, even if there is no unconditional inverse 
for the general case, we can identify the condition under which a bit-vector 
operation is invertible. For the bit-vector multiplication constraint x- s ~ t with 
x ¢ FV(s) UFV(t), the invertibility condition for x can be expressed by the 
formula (—s|s) & ta t. 


Definition 1 (Invertibility Condition). Let [x] be a S'py-literal. A quantifier- 
free Spy -formula ġe is an invertibility condition for x in [a] if £x g FV(dc) 
and ġe = Jx. lx] is Tay -valid. 


An invertibility condition for a literal 4[x] provides the exact conditions under 
which £[z] is solvable for x. We call it an “invertibility” condition because we can 
use Hilbert choice functions to express all such conditional solutions with a single 
symbolic term, that is, a term whose possible values are exactly the solutions 
for x in é[a]. Recall that a choice function ey. fy] represents a solution for a 
formula |x] if there exists one, and represents an arbitrary value otherwise. 
We may use a choice function to describe inverse solutions for a literal ¢[2] 
with invertibility condition s as £y. (e = é[y]). For example, for the general 
case of bit-vector multiplication over equality the choice function is defined as 
ey.((-s|s)&txt > y-sxt). 


Lemma 2. If ¢. is an invertibility condition for an e-valid Spy -literal €[a] and 
r is the term ey. (dc = E[y]), then r is e-valid and f[r] & Ax. [a] is Tgy -valid.! 


Intuitively, the lemma states that when ¢[2] is satisfiable (under condition 
$c), any value returned by the choice function ey. (e = &[y]) is a solution of ¢[a] 
(and thus Jx. [x] holds). Conversely, if there exists a value v for x that makes 
[a] true, then there is a model of Tgy that interprets cy. (ġe => f[y]) as v. 

Now, suppose that »’py-literal £ is again linear in x but that x occurs arbi- 
trarily deep in £. Consider, for example, a literal s1-(sg+2) ~ t where x does not 
occur in s1, S2 or t. We can solve this literal for x by recursively computing the 
(possibly conditional) inverses of all bit-vector operations that involve x. That 
is, first we solve s,-2' = t for x’, where 2’ is a fresh variable abstracting s2 + 2, 
which yields the choice function x’ = ey. ((—s1 | s1) & t ~ t => sı: y xt). Then, 
we solve s2 +x = w’ for x, which yields the solution x = x’ — s2 = ey. ((—s1 | 
sı) & tet >s,-y xt) — so. 

Figure | describes in pseudo code the procedure to solve for x in an arbitrary 
literal [xz] = e[x] = t that is linear in x. We assume that e[2] is built over the 
set of bit-vector operators listed in Table 1. Function solve recursively constructs 
a symbolic solution by computing (conditional) inverses as follows. Let function 
getInverse(a, é[x]) return a term t that is the inverse of x in ¢[z], i.e., such that 
[a] = x = t. Furthermore, let function getlC(x, ¢[z]) return the invertibility 
condition ¢, for x in ¢[z]. If ex] has the form o(e1,...,e,) with n > 0, x must 


1 All proofs can be found in an extended version of this paper [19]. 
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solve(zx, e[x] > t): 

Ife=2 
If > € {=} then return t 
else return ey. (getIC(a, x œ t) > y r t). 

else e = ọ(e1,. . . eilz], ..., en) with n > 0 and x ¢ FV(e;) for all j Æ i. 
Let d[x’] = o(e1,.. . , €i—1, Z’, €i41,---,€n) Where z’ is a fresh variable. 
If% E {~, #} ando € {~,—, +} 
then let t’ = getInverse(x’, d[x’] ~ t) and return solve(z, e; D< t’) 
else let ġe = getlC(zx’, d[x’] x t) and return solve(x, e; © ey. (de => dy] x t)). 


Fig. 1. Function solve for constructing a symbolic solution for x given a linear literal 
e[a] r t. 


occur in exactly one of the subterms e1,...,e€n given that e is linear in x. Let 
d be the term obtained from e by replacing e; (the subterm containing x) with 
a fresh variable x’. We solve for subterm e,[x] (treating it as a variable x’) 
and compute an inverse getlnverse(x’, d[x’] ~ t), if it exists. Note that for a 
disequality e[x] Æ% t, it suffices to compute the inverse over equality and propagate 
the disequality down. (For example, for e;[x] + s % t, we compute the inverse 
t = getlnverse(x', 2’ + s ~ t) = t — s and recurse on e;|x] % t.) If no inverse 
for e[x] ht exists, we first determine the invertibility condition ¢, for d[x’] via 
getlC(«’, d[x’] pt), construct the choice function ey. (ġe = d[y] ht), and set it 
equal to e;[x], before recursively solving for x. If e|x] = x and the given literal 
is an equality, we have reached the base case and return t as the solution for x. 
Note that in Fig. 1, for simplicity we omitted one case for which an inverse can 
be determined, namely x-c ~ t where c is an odd constant. 


Theorem 3. Let |x] be an ¢-valid Upy-literal linear in x, and let r = 
solve(x, €[z]). Then r is e-valid, FV(r) C FV(2£) \ {x} and |r] = Az. é[x] is 
Tey -valid. 


Tables 2 and 3 list the invertibility conditions for bit-vector operators {-, mod 
w+, &, |, >, >a, K, o} over relations {~, 4, <u, >u}. Due to space restrictions 
we omit the conditions for signed inequalities since they can be expressed in 
terms of unsigned inequality. We omit the invertibility conditions over {<y, 
>u} since they can generally be constructed by combining the corresponding 
conditions for equality and inequality—although there might be more succinct 
equivalent conditions. Finally, we omit the invertibility conditions for operators 
{~, —, +} and literals x > t over inequality since they are basic bounds checks, 
e.g., for £x <s t we have t # min. The invertibility condition for x æ% t and for 
the extract operator is T.? 


? All the omitted invertibility conditions can be found in the extended version of this 
paper [19]. 
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The idea of computing the inverse of bit-vector operators has been used 
successfully in a recent local search approach for solving quantifier-free bit-vector 
constraints by Niemetz et al. [17]. There, target values are propagated via inverse 
value computation. In contrast, our approach does not determine single inverse 
values based on concrete assignments but aims at finding symbolic solutions 
through the generation of conditional inverses. In an extended version of that 
work [18], the same authors present rules for inverse value computation over 
equality but they provide no proof of correctness for them. We define invertibility 
conditions not only over equality but also disequality and (un)signed inequality, 
and verify their correctness up to a certain bit-width. 


3.1 Synthesizing Invertibility Conditions 


We have defined invertibility conditions for all bit-vector operators in Xgy where 
no general inverse exists (162 in total). A noteworthy aspect of this work is that 
we were able to leverage syntax-guided synthesis (SyGuS) technology [1] to help 
identify these conditions. The problem of finding invertibility conditions for a 
literal of the form xo s x t (or, dually, s o x x t) linear in x can be recast 
as a SyGuS problem by asking whether there exists a binary Boolean function 
C such that the (second-order) formula SCVsVt. ((Ix. x o s x t) & C(s,t)) is 
satisfiable. If a SyGuS solver is able to synthesize the function C, then C can be 
used as the invertibility condition for x o s ax t. To simplify the SyGuS problem 
we chose a bit-width of 4 for x, s, and t and eliminated the quantification over 
x in the formula above by expanding it to 


15 
ACvsvt. (\/ io sot) & C(s, t) 
1=0 


Since the search space for SyGuS solvers heavily depends on the input gram- 
mar (which defines the solution space for C), we decided to use two gram- 
mars with the same set of Boolean connectives but different sets of bit-vector 
operators: 


Or = {>, As x, <u) <s, 0, mins, MaXs, 8,t,~ ae) &, |} 
O; = {7, A, V, X, Lu, <s; Zi 2s) 0, mins, maXs, 8,t,~ i Fa —,&, |, >>, <} 


The selection of constants in the grammar turned out to be crucial for finding 
solutions, e.g., by adding min, and max, we were able to synthesize substantially 
more invertibility conditions for signed inequalities. For each of the two sets of 
operators, we generated 140 SyGuS problems, one for each combination of bit- 
vector operator © € {-, mod, +, &, |, >>, >a, X} over relation = € {x%, %, 
<u, Su, >u; 2u; <s; Ss; >s, 2s}, and used the SyGuS extension of the CVC4 
solver [22] to solve these problems. 

Using operators O, (Og) we were able to synthesize 98 (116) out of 140 
invertibility conditions, with 118 unique solutions overall. When we found more 


3 Available at https://cvc4.cs.stanford.edu/papers/CAV2018-QBV/. 
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Table 2. Conditions for the invertibility of bit-vector operators over (dis)equality. 
Those for -, & and | are given modulo commutativity of those operators. 


el] x # 
x-shXit -s|s)&tx s#OVtECO 
x mod smt|~(—s) >u t s#1IVtCO 
smodamt|(t+t—s)&s>.,t sxOVtELO 
crismt (s-this® s#xOVtEEAO 
te f =1 
stamt si(s+t)xt ae $ PERNA) 
T otherwise 

r&sxt t&sx s#0VtHEO 
x|smt tlsat s#~0Vtěæ~0 
r>œ>sAt |(t<s)>sxt t#0 V s <u K(s) 

K(s) 
s>> amt V s>pixt sž0vtæo0 

i=0 

u t asot 

peer (s <u K(s) > (ts) Daset)A z 

(s >u K(s) > (tœ ~0 V ta 0)) 

K(s) t 
s>arat | Vs >aixt eee Le 

56 (tÆ ~0 V s#~-0) 
rKsmt (œs) Kst t#0 V s <u K(s) 

#(s) 
S< TAE V s&izxt sž0vtæo0 

i=0 
rosmt sx t|k(s)— 1:0] T 


sort s & t|k(t)— 1: k(t) — K(s)] T 


than one solution for a condition (either with operators O, and Og, or manually) 
we chose the one that involved the smallest number of bit-vector operators. Thus, 
we ended up using 79 out of 118 synthesized conditions and 83 manually crafted 
conditions. 

In some cases, the SyGuS approach was able to synthesize invertibility con- 
ditions that were smaller than those we had manually crafted. For example, we 
manually defined the invertibility condition for x -s ~ t as (t x 0) v ((t & 
—t) >u (s & —s) A (s # 0)). With SyGuS we obtained ((—s | s) & t) x t. 
For some other cases, however, the synthesized solution involved more bit-vector 
operators than needed. For example, for x mod s # t we manually defined the 
invertibility condition (s #1) V (t % 0), whereas SyGuS produced the solution 
~(—s) | t # 0. For the majority of invertibility conditions, finding a solution 
did not require more than one hour of CPU time on an Intel Xeon E5-2637 
with 3.5 GHz. Interestingly, the most time-consuming synthesis task (over 107 h 
of CPU time) was finding condition ((t + t) — s) & s >, t for smoda xt. 
A small number of synthesized solutions were only correct for a bit-width of 4, 
e.g., solution (~s<s)<s <, t for x +s <, t. In total, we found 6 width- 
dependent synthesized solutions, all of them for bit-vector operators + and 
mod. For those, we used the manually crafted invertibility conditions instead. 
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Table 3. Conditions for the invertibility of bit-vector operators over unsigned inequal- 
ity. Those for -, & and | are given modulo commutativity of those operators. 


ea] <u >u 
xc-seit t #0 t<u—s|s 
xmodsmt|t#0 t <u ~(—s) 
smodamt|t #0 t <u 8 
rtseit O<usAO<ut ~0+s>ut 
start 0<u~(-t & s) AO <ut t <u ~0 
xr&sxt t#0 t <u 8 
x|smt S<ut t <u ~0 
zr>>smit |t¢0 t<urs>s 
s>>amt t#0 t<us 
L>>aseit t0 t <u ~0 
Sart |\(s<utVs>s0)At#O |s <s (s>>0t) Vt<us 
r<«Ksmt |t¢0 t<urd<s 
K(s) 
s<amt |t#0 Ves) Dut 
rosmt tz S0 > 8s <uts tz = ~O > 8 Su ts 
where t, = t[k(t) — 1: k(t) — K(x)], ts = t[k(s) — 1: 0] 
soxrmt S Su ts A (s X ts > ts #0) S>uts ASA ts > te ~~0 
where tz = t[k(x)— 1 : 0], ts = t[k(t) — 1 : k(t) — K(s)] 


3.2 Verifying Invertibility Conditions 


We verified the correctness of all 162 invertibility conditions for bit-widths from 1 
to 65 by checking for each bit-width the Tgy-unsatisfiability of the formula 
A(de & Ix. L[x]) where £ ranges over the literals in Tables 2 and 3 with s and t 
replaced by fresh constants, and ¢, is the corresponding invertibility condition. 

In total, we generated 12,980 verification problems and used all participating 
solvers of the quantified bit-vector division of SMT-competition 2017 to verify 
them. For each solver/benchmark pair we used a CPU time limit of one hour 
and a memory limit of 8GB on the same machines as those mentioned in the 
previous section. We consider an invertibility condition to be verified for a certain 
bit-width if at least one of the solvers was able to report unsatisfiable for the 
corresponding formula within the given time limit. Out of the 12,980 instances, 
we were able to verify 12,277 (94.6%). 

Overall, all verification tasks (including timeouts) required a total of 275 days 
of CPU time. The success rate of each individual solver was 91.4% for Boolector, 
85.0% for CVC4, 50.8% for Q3B, and 92% for Z3. We observed that on 30.6% of 
the problems, Q3B exited with a Python exception without returning any result. 
For bit-vector operators {~ , —, +, &, |, >, >a, <<, o}, over all relations, and 
for operators {-, +, mod} over relations {%, <u, <s}, we were able to verify 
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all invertibility conditions for all bit-widths in the range 1-65. Interestingly, no 
solver was able to verify the invertibility conditions for x mod s <, t with a 
bit-width of 54 and s mod x <,, t with bit-widths 35-37 within the allotted 
time. We attribute this to the underlying heuristics used by the SAT solvers 
in these systems. All other conditions for <, and <, were verified for all bit- 
vector operators up to bit-width 65. The remaining conditions for operators {., 
+, mod} over relations {%, >u, u, >s, >s} were verified up to at least a bit- 
width of 14. We discovered 3 conditions for s + x ™ t with mw E {%, >s, >s} 
that were not correct for a bit-width of 1. For each of these cases, we added an 
additional invertibility condition that correctly handles that case. 

We leave to future work the task of formally proving that our invertibility 
conditions are correct for all bit-widths. Since this will most likely require the 
development of an interactive proof, we could leverage some recent work by Ekici 
et al. [9] that includes a formalization in the Coq proof assistant of the SMT-LIB 
theory of bit-vectors. 


4 Counterexample-Guided Instantiation for Bit- Vectors 


In this section, we leverage techniques from the previous section for constructing 
symbolic solutions to bit-vector constraints to define a novel instantiation-based 
technique for quantified bit-vector formulas. We first briefly present the overall 
theory-independent procedure we use for quantifier instantiation and then show 
how it can be specialized to quantified bit-vectors using invertibility conditions. 

We use a counterexample-guided approach for quantifier instantiation, as 
given by procedure CEGQIs in Fig. 2. To simplify the exposition here, we focus 
on input problems expressed as a single formula in prenex normal form and with 
up to one quantifier alternation. We stress, though, that the approach applies 
in general to arbitrary sets of quantified formulas in some &-theory T with a 
decidable quantifier-free fragment. The procedure checks via instantiation the 
T-satisfiability of a quantified input formula y of the form JyVa. w(x, y] where 
w is quantifier-free and x and y are possibly empty sequences of variables. It 
maintains an evolving set J’, initially empty, of quantifier-free instances of the 
input formula. During each iteration of the procedure’s loop, there are three pos- 
sible cases: (1) if I’ is T-unsatisfiable, the input formula ¢ is also T-unsatisfiable 
and “unsat” is returned; (2) if I’ is T-satisfiable but not together with 7w|y, æ], 
the negated body of y, then I’ entails y in T, hence y is T-satisfiable and “sat” 
is returned. (3) If neither of previous cases holds, the procedure adds to I’ an 
instance of Y obtained by replacing the variables æ with some terms t, and 
continues. The procedure CEGQI is parametrized by a selection function S that 
generates the terms t. 


Definition 4 (Selection Function). A selection function takes as input a tuple 
of variables x, a model T of T, a quantifier-free %-formula [a], and a set I of 
y-formulas such that x g FV(T) and T = [U{-7}. It returns a tuple of e-valid 
terms t of the same type as æ such that FV(t) C FV(q) \ a. 
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CEGQIs(SyVz. wy, x]) 
T:=0 
Repeat: 
1. If I is T-unsatisfiable, then return “unsat’. 
2. Otherwise, if I’ = I U {~y[y, x]} is T-unsatisfiable, then return “sat”. 
3. Otherwise, let Z be a model of T and T” and t = S(x,w,7, T). T = Pu{wvly, tl}. 


Fig. 2. A counterexample-guided quantifier instantiation procedure CEGQIs, parame- 
terized by a selection function S, for determining the T-satisfiability of dyVz. v[y, x] 
with Y quantifier-free and FV(4%) =yUa. 


Definition 5. Let y|x] be a quantifier-free S-formula. A selection function is: 


1. Finite for x and w if there is a finite set S* such that S(x,,Z,I) € S* for 
all legal inputs T and T. 

2. Monotonic for x and Y% if for all legal inputs T and T, S(x,w,Z,l) = t only 
if volt] g r. 


Procedure CEGQIs is refutation-sound and model-sound for any selection 
function S, and terminating for selection functions that are finite and monotonic. 


Theorem 6 (Correctness of CEGQIs). Let S be a selection function and let 
p = dyVaz. ly, x] be a legal input for CEGQls. Then the following hold. 


1. If CEGQls(y) returns “unsat”, then p is T-unsatisfiable. 

2. If CEGQls(yv) returns “sat” for some final I, then p is T-equivalent to 
Jy. Mera 

3. If S is finite and monotonic for x and w, then CEGQIs (p) terminates. 


Thanks to this theorem, to define a T-satisfiability procedure for quantified 
X-formulas, it suffices to define a selection function satisfying the criteria of 
Definition 4. We do that in the following section for Ty. 


4.1 Selection Functions for Bit-Vectors 


In Fig. 3, we define a (class of) selection functions SPV for quantifier-free bit- 
vector formulas, which is parameterized by a configuration c, a value of the 
enumeration type {m, k, s, b}. The selection function collects in the set M 
all the literals occurring in I” that are satisfied by Z. Then, it collects in the 
set N a projected form of each literal in M. This form is computed by the 
function project, parameterized by configuration c. That function transforms its 
input literal into a form suitable for function solve from Fig. 1. We discuss the 
intuition for projection operations in more detail below. 

After constructing set N, the selection function computes a term t; for each 
variable x; in tuple x, which we call the solved form of xi. To do that, it first 
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SPY (x, Y,Z, T) where c € {m,k,s, b} 
Let M = {4 | T | £, L € Lit(w)}, N = {project,(Z, £) | L€ M}. 


Fori = 1,..., where t= (tipoy fr): 
Let N; = Use _ pasha linearize(x;, Z, €[t1,...,ts—1]). 
solve(x;, choose(N;)) if N; is non-empty 
Let t; = I ; 
£i otherwise 


tj = tj{x; + ti} for each j < i. 
Return (t1,..., tn). 


project (Z, s > t) : return T project (Z, s œ< t) : return s ~ t + (s — t)? 
sat if s* =t 
project, (Z,s ot) :retunsot project, (T, s< t) : return < sxt+1 ifs? >t? 
sxt-1 iff <t 


Fig. 3. Selection functions SPV for quantifier-free bit-vector formulas. The procedure 
is parameterized by a configuration c, one of either m (model value), k (keep), s (slack), 
or b (boundary). 


constructs a set of literals N; all linear in x;. It considers literals Z from N and 
replaces all previously solved variables x1,...,£i—-1 by their respective solved 
forms to obtain the literal @’ = ¢[t),...,t;-1]. It then calls function linearize on 
literal @’ which returns a set of literals, each obtained by replacing all but one 
occurrence of x; in £ with the value of x; in Z.* 


Example 7. Consider an interpretation Z where z? = 1, and Ngy-terms a and b 
with x ¢ FV(a) U FV(b). We have that linearize(x,Z, x - (x + a) © b) returns the 
set {1- (x +a) ~ b,x- (1 +a) & b}; linearize(x,Z,x >, a) returns the singleton 
set {x >, a}; linearize(x, Z,a % b) returns the empty set. A 


If the set N; is non-empty, the selection function heuristically chooses a literal 
from N; (indicated in Fig. 3 with choose(V;)). It then computes a solved form t; 
for x; by solving the chosen literal for x; with the function solve described in the 
previous section. If N; is empty, we let t; is simply the value of x; in the given 
model Z. After that, x; is eliminated from all the previous terms t1, ...,ti—1 by 
replacing it with t;. After processing all n variables of æ, the tuple (t1,...,tn) 
is returned. 

The configurations of selection function SPY determine how literals in M 
are modified by the project, function prior to computing solved forms, based 
on the current model Z. With the model value configuration m, the selection 
function effective ignores the structure of all literals in M and (because the 
set N; is empty) ends up choosing the value r? as the solved form variable 
4 Thisisa simple heuristic to generate literals that can be solved for x;. More elaborate 

heuristics could be used in practice. 


248 A. Niemetz et al. 


£i, for each i. On the other end of the spectrum, the configuration k keeps all 
literals in M unchanged. The remaining two configurations have an effect on 
how disequalities and inequalities are handled by project.. With configuration s 
project, normalizes any kind of literal (equality, inequality or disequality) s pt 
to an equality by adding the slack value (s — t)? to t. With configuration b it 
maps equalities to themselves and inequalities and disequalities to an equality 
corresponding to a boundary point of the relation between s and t based on 
the current model. Specifically, it adds one to t if s is greater than t in Z, it 
subtracts one if s is smaller than t, and returns s = t if their value is the same. 
These two configurations are inspired by quantifier elimination techniques for 
linear arithmetic [5,15]. In the following, we provide an end-to-end example of 
our technique for quantifier instantiation that makes use of selection function 
SBY. 


Example 8. Consider formula y = Jy. Yzx1ı. (xı -a Lu b) where a and b are terms 
with no free occurrences of xı. To determine the satisfiability of y, we invoke 
CEGQI sev on ọ for some configuration c. Say that in the first iteration of the 
loop, we find that I” = I U {x1 -a >, b} is satisfied by some model Z of Tay 
such that z? = 1, a? = 1, and b? = 0. We invoke SP?” ((x1),Z, I’) and first 
compute M = {z1 -a >, b}, the set of literals of I” that are satisfied by Z. 
The table below summarizes the values of the internal variables of SZV for the 
various configurations: 


Config | Ni th 

m ) 1 

k {z1 -a >u b} Ez. (a <u —b | b) => z -a >u b 

sb {rı -axb+1} ez. ((—a|a)&b+1xb+1)>z-axb+1 


In each case, SZV returns the tuple (t1), and we add the instance tı -a <u b 
to I’. Consider configuration k where tı is the choice expression €z. ((a <u —b | 
b) > z-a >u b). Since tı is e-valid, due to the semantics of £, this instance is 
equivalent to: 


((a <u —b| b) > k-a >u b) Aka <u b (1) 


for fresh variable k. This formula is Tgy-satisfiable if and only if =(a <u —b | b) is 
Tgv-satisfiable. In the second iteration of the loop in CEGQI sav, set I” contains 
formula (1) above. We have two possible outcomes: 


(i) a=(a <u —b | b) is Tgy-unsatisfiable. Then (1) and hence I are Tpy- 
unsatisfiable, and the procedure terminates with “unsat”. 

(ii) a(a <u —b | b) is satisfied by some model J of Tgy. Then 3z.z -a <u b is 
false in J since the invertibility condition of z -a <a b is false in J. Hence, 
I’ = T U {z1 -a >, b} is unsatisfiable, and the algorithm terminates with 
“sat”. 
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In fact, we argue later that quantified bit-vector formulas like y above, which 
contain only one occurrence of a universal variable, require at most one instanti- 
ation before CEGQI sav terminates. The same guarantee does not hold with the 
other configurations. In particular, configuration m generates the instantiation 
where tı is 1, which simplifies to a <, b. This may not be sufficient to show 
that I or I” is unsatisfiable in the second iteration of the loop and the algo- 
rithm may resort to enumerating a repeating pattern of instantiations, such as 
zı > 1,2,3,... and so on. This obviously does not scale for problems with large 
bit-widths. A 


More generally, we note that CEGQI SEY terminates with at most one instance 
for input formulas whose body has just one literal and a single occurrence of each 
universal variable. The same guarantee does not hold for instance for quantified 
formulas whose body has multiple disjuncts. For some intuition, consider extend- 
ing the second conjunct of (1) with an additional disjunct, i.e. (k -a <u bV £[k]). 
A model can be found for this formula in which the invertibility condition 
(a <u —b | b) is still satisfied, and hence we are not guaranteed to terminate 
on the second iteration of the loop. Similarly, if the literals of the input formula 
have multiple occurrences of z1, then multiple instances may be returned by the 
selection function since the literals returned by linearize in Fig. 3 depend on the 
model value of xı, and hence more than one possible instance may be considered 
in loop in Fig. 2. 

The following theorem summarizes the properties of our selection functions. 
In the following, we say a quantified formula is unit linear invertible if it is of 
the form Va.é[a] where £ is linear in x and has an invertibility condition for x. 
We say a selection function is n-finite for a quantified formula w if the number 
of possible instantiations it returns is at most n for some positive integer n. 


Theorem 9. Let y|x] be a quantifier-free formula in the signature of Tgv. 


1. SPV is a finite selection function for x and a for all c € {m,k,s, b}. 
2. SBV is monotonic. 

3. SBY is 1-finite if y is unit linear invertible. 

4. SBY is monotonic if y is unit linear invertible. 


This theorem implies that counterexample-guided instantiation using configura- 
tion SB” is a decision procedure for quantified bit-vectors. However, in practice 
the worst-case number of instances considered by this configuration for a variable 
Tin] is proportional to the number of its possible values (2”), which is practi- 
cally infeasible for sufficiently large n. More interestingly, counterexample-guided 
instantiation using SẸ V is a decision procedure for quantified formulas that are 
unit linear invertible, and moreover has the guarantee that at most one instan- 
tiation is returned by this selection function. Hence, formulas in this fragment 
can be effectively reduced to quantifier-free bit-vector constraints in at most two 
iterations of the loop of procedure CEGQIs in Fig. 2. 
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4.2 Implementation 


We implemented the new instantiation techniques described in this section as an 
extension of CVC4, which is a DPLL(T)-based SMT solver [20] that supports 
quantifier-free bit-vector constraints, (arbitrarily nested) quantified formulas, 
and support for choice expressions. For the latter, all choice expressions ex. y[x] 
are eliminated from assertions by replacing them with a fresh variable k of the 
same type and adding y|k] as a new assertion, which notice is sound since all 
choice expressions we consider are ¢-valid. In the remainder of the paper, we 
will refer to our extension of the solver as cegqi. In the following, we discuss 
important implementation details of the extension. 


Handling Duplicate Instantiations. The selection functions SPY and SBV are 
not guaranteed to be monotonic, neither is S2Y for quantified formulas that 
contain more than one occurrence of universal variables. Hence, when applying 
these strategies to arbitrary quantified formulas, we use a two-tiered strategy 
that invokes SB” as a second resort if the instance for the terms returned by a 
selection function already exists in I’. 


Linearizing Rewrites. Our selection function in Fig.3 uses the function linearize 
to compute literals that are linear in the variable x; to solve for. The way we 
presently implement linearize makes those literals dependent on the value of x; 
in the current model Z, with the risk of overfitting to that model. To address 
this limitation, we use a set of equivalence-preserving rewrite rules whose goal 
is to reduce the number of occurrences of x; to one when possible, by applying 
basic algebraic manipulations. As a trivial example, a literal like z; +a; ~ a 
is rewritten first to 2 - x; a which is linear in x; if a does not contain x,;. In 
that case, this literal, and so the original one, has an invertibility condition as 
discussed in Sect. 3. 


Variable Elimination. We use procedure solve from Sect. 3 not only for selecting 
quantifier instantiations, but also for eliminating variables from quantified for- 
mulas. In particular, for a quantified formula of the form Vay. l > fz, y], if £ is 
linear in x and solve(x, £) returns a term s containing no ¢-expressions, we can 
replace this formula by Vy. |s, y]. When £ is an equality, this is sometimes called 
destructive equality resolution (DER) and is an important implementation-level 
optimization in state-of-the-art bit-vector solvers [25]. As shown in Fig. 1, we use 
the getInverse function to increase the likelihood that solve returns a term that 
contains no €-expressions. 


Handling Extract. Consider formula V2j39). (x[31 : 16] % ape) V z[15 : 0] # 
biig]). Since all invertibility conditions for the extract operator are T, rather 
than producing choice expressions we have found it more effective to eliminate 
extracts via rewriting. As a consequence, we independently solve constraints 
for regions of quantified variables when they appear underneath applications of 
extract operations. In this example, we let the solved form of x be yrig] © 216] 
where y and z are fresh variables, and subsequently solve for these variables in 
y ~a and z = b. Hence, we may instantiate x with ao b, a term that we would 
not have found by considering the two literals independently in the negated body 
of the formula above. 


Solving Quantified Bit-Vectors Using Invertibility Conditions 251 


5 Evaluation 


We implemented our techniques in the solver cegqi and considered four configu- 
rations cegqi,, where c is one of {m, k, s, b}, corresponding to the four selection 
function configurations described in Sect.4. Out of these four configurations, 
cegqi,,, is the only one that does not employ our new techniques but uses only 
model values for instantiation. It can thus be considered our base configuration. 
All configurations enable the optimizations described in Sect. 4.2 when applica- 
ble. We compared them against all entrants of the quantified bit-vector division 
of the 2017 SMT competition SMT-COMP: Boolector [16], CVC4 [2], Q3B [14] 
and Z3 [6]. With the exception of Q3B, all solvers are related to our approach 
since they are instantiation-based. However, none of these solvers utilizes invert- 
ibility conditions when constructing instantiations. We ran all experiments on 
the StarExec logic solving service [24] with a 300s CPU and wall clock time 
limit and 100 GB memory limit. 

We evaluated our approach on all 5,151 benchmarks from the quantified bit- 
vector logic (BV) of SMT-LIB [3]. The results are summarized in Table 4. Config- 
uration cegqi;, solves the highest number of unsatisfiable benchmarks (4, 399), 
which is 30 more than the next best configuration cegqi, and 37 more than 


Table 4. Results on satisfiable and unsatisfiable benchmarks with a 300s timeout. 


unsat Boolector | CVC4 Q3B 23 cegqi,,, | cegqi, | cegqi, | cegqi;,, 
h-uauto 14 12 93 24 10 103 105 106 
keymaera 3917 3790 |3781 3923 | 3803 3798 3888 |3918 
psyco 62 62 49 62 62 39 62 61 
scholl 57 36 13 67 36 27 36 35 
tptp 55 52 56 56 56 56 56 56 
uauto 137 72 131 137 72 72 135 137 
ws-fixpoint 74 71 75 74 75 74 75 75 
ws-ranking 16 8 18 19 15 11 12 11 
Total unsat | 4332 4103 |4216 4362 | 4129 4180 4369 4399 
sat Boolector | CVC4 Q3B Z3 cegqi,,, | cegqi;, | cegqi, | cegqi,, 
h-uauto 15 10 17 13 16 17 16 17 
keymaera 108 21 24 | 108 20 13 36 75 
psyco 131 132 50 | 131 | 132 60 132 129 
scholl 232 160 201 | 204 | 203 188 208 211 
tptp 17 17 17 17 17 17 17 17 
uauto 14 14 15 16 14 14 14 14 
ws-fixpoint 45 49 54 36 45 51 49 50 
ws-ranking 19 15 37 33 33 31 31 32 
Total sat 581 418 415 | 558 | 480 391 503 545 
Total (5151) 4913 4521 |4631 4920 | 4609 4571 4872 |4944 
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the next best external solver, Z3. Compared to the instantiation-based solvers 
Boolector, CVC4 and Z3, the performance of cegqi, is particularly strong on the 
h-uauto family, which are verification conditions from the Ultimate Automizer 
tool [11]. For satisfiable benchmarks, Boolector solves the most (581), which is 
36 more than our best configuration cegqip. 

Overall, our best configuration cegqi, solved 335 more benchmarks than 
our base configuration cegqi,,. A more detailed runtime comparison between 
the two is provided by the scatter plot in Fig. 4. Moreover, cegqi,, solved 24 
more benchmarks than the best external solver, Z3. In terms of uniquely solved 
instances, cegqi, was able to solve 139 benchmarks that were not solved by 
Z3, whereas Z3 solved 115 benchmarks that cegqi, did not. Overall, cegqi;, 
was able to solve 21 of the 79 benchmarks (26.6%) not solved by any of the 
other solvers. For 18 of these 21 benchmarks, it terminated after considering 
no more than 4 instantiations. These cases indicate that using symbolic terms 
for instantiation solves problems for which other techniques, such as those that 
enumerate instantiations based on model values, do not scale. 

Interestingly, configuration cegqi,, despite having the strong guarantees 
given by Theorem 9, performed relatively poorly on this set (with 4,571 solved 
instances overall). We attribute this to the fact that most of the quantified for- 
mulas in this set are not unit linear invertible. In total, we found that only 25.6% 
of the formulas considered during solving were unit linear invertible. However, 
only a handful of benchmarks were such that all quantified formulas in the prob- 
lem were unit linear invertible. This might explain the superior performance of 
cegqi, and cegqi,, which use invertibility conditions but in a less monolithic way. 


For some intuition on this, consider the  _. 10x faster (61) 100x faster (245) =~" 1000x faster (51) 
problem Vz.(2 > aV x < b) where << See eee 
a and b are such that a > b is Tpy- 400 ak i 
valid. Intuitively, to show that this for- , ” oes Ke 
mula is unsatisfiable requires the solver = 4o- Xe me” gt 
to find an x between b and a. This £ <i * ao 
is apparent when considering the dual © ; 4 x Se * 7A e” 
problem Jz.(x < aAz > b). Con- $ ARE E | 
figuration cegqi, is capable of finding orf a P> l oe 
such an x, for instance, by consider- e a i P x og 
ing the instantiation «++ a when solv- oo- a - 

ing for the boundary point of the first aii ‘a i ie re 
disjunct. Configuration cegqi,, on the cegal_m Runtime [s] 


other hand, would instead consider the 

instantiation of x for two terms that Fig. 4. Configuration cegqi,, vs. cegqi,. 
witness €-expressions: some kı that is 

never smaller than a, and some kə that is never greater that b. Neither of these 
terms necessarily resides in between a and b since the solver may subsequently 
consider models where kı > b and k2 < a. This points to a potential use for 
invertibility conditions that solve multiple literals simultaneously, something we 
are currently investigating. 
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6 Conclusion 


We have presented a new class of strategies for solving quantified bit-vector for- 
mulas based on invertibility conditions. We have derived invertibility conditions 
for the majority of operators in a standard theory of fixed-width bit-vectors. An 
implementation based on this approach solves over 25% of previously unsolved 
verification benchmarks from SMT LIB, and outperforms all other state-of-the- 
art bit-vector solvers overall. 

In future work, we plan to develop a framework in which the correctness of 
invertibility conditions can be formally established independently of bit-width. 
We are working on deriving invertibility conditions that are optimal for linear 
constraints, in the sense of admitting the simplest propositional encoding. We 
also are investigating conditions that cover additional bit-vector operators, some 
cases of non-linear literals, as well as those that cover multiple constraints. While 
this is a challenging task, we believe efficient syntax-guided synthesis solvers can 
continue to help push progress in this direction. Finally, we plan to investigate 
the use of invertibility conditions for performing quantifier elimination on bit- 
vector constraints. This will require a procedure for deriving concrete witnesses 
from choice expressions. 
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Abstract. Incremental determinization is a recently proposed algorithm 
for solving quantified Boolean formulas with one quantifier alterna- 
tion. In this paper, we formalize incremental determinization as a set 
of inference rules to help understand the design space of similar algo- 
rithms. We then present additional inference rules that extend incre- 
mental determinization in two ways. The first extension integrates the 
popular CEGAR principle and the second extension allows us to analyze 
different cases in isolation. The experimental evaluation demonstrates 
that the extensions significantly improve the performance. 


1 Introduction 


Solving quantified Boolean formulas (QBFs) is one of the core challenges in 
automated reasoning and is particularly important for applications in verification 
and synthesis. For example, program synthesis with syntax guidance [1,2] and 
the synthesis of reactive controllers from LTL specifications has been encoded 
in QBF [3,4]. Many of these problems require only formulas with one quantifier 
alternation (2QBF), which are the focus of this paper. 

Algorithms for QBF and program synthesis largely rely on the 
counterexample-guided inductive synthesis principle (CEGIS) [5], originating 
in abstraction refinement (CEGAR) [6,7]. For example, for program synthe- 
sis, CEGIS-style algorithms alternate between generating candidate programs 
and checking them for counter-examples, which allows us to lift arbitrary veri- 
fication approaches to synthesis algorithms. Unfortunately, this approach often 
degenerates into a plain guess-and-check loop when counter-examples cannot 
be generalized effectively. This carries over to the simpler setting of 2QBF. For 
example, even for a simple formula such as Vz.dy. x = y, where x and y are 32-bit 
numbers, most QBF algorithms simply enumerate all 2°? pairs of assignments. In 
fact, even the modern QBF solvers diverge on this formula when preprocessing 
is deactivated. 

Recently, Incremental Determinization (ID) has been suggested to overcome 
this problem [8]. ID represents a departure from the CEGIS approach in that it 
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is structured around identifying which variables have unique Skolem functions. 
(To prove the truth of a 2QBF Vz.dy. y we have to find Skolem functions f 
mapping x to y such that y[f/y] is valid.) After assigning Skolem functions to 
a few of the existential variables, the propagation procedure determines Skolem 
functions for other variables that are uniquely implied by that assignment. When 
the assignment of Skolem functions turns out to be incorrect, ID analyzes the 
conflict, derives a conflict clause, and backtracks some of the assignments. In 
other words, ID lifts CDCL to the space of Skolem functions. 

ID can solve the simple example given above and shows good performance on 
various application benchmarks. Yet, the QBF competitions have shown that the 
relative performance of ID and CEGIS still varies a lot between benchmarks [9]. 
A third family of QBF solvers, based on the expansion of universal variables [10— 
12], shows yet again different performance characteristics and outperforms both 
ID and CEGIS on some (few) benchmarks. This variety of performance char- 
acteristics of different approaches indicates that current QBF solvers could be 
significantly improved by integrating the different reasoning principles. 

In this paper, we first formalize and generalize ID [8] (Sect. 3). This helps us 
to disentangle the working principles of the algorithm from implementation-level 
design choices. Thereby our analysis of ID enables a systematic and principled 
search for better algorithms for quantified reasoning. To demonstrate the value 
and flexibility of the formalization, we present two extensions of ID that integrate 
CEGIS-style inductive reasoning (Sect. 4) and expansion (Sect. 5). In the exper- 
imental evaluation we demonstrate that both extensions significantly improve 
the performance compared to plain ID (Sect. 6). 


Related Work. This work is written in the tradition of works such as the Model 
Evolution Calculus [13], AbstractDPLL [14], MCSAT [15], and recent calculi for 
QBF [16], which present search algorithms as inference rules to enable the study 
and extension of these algorithms. ID and the inference rules presented in this 
paper can be seen as an instantiation of the more general frameworks, such as 
MCSAT [15] or Abstract Conflict Driven Learning [17]. 

Like ID, quantified conflict-driven clause learning (QCDCL) lifts CDCL to 
QBF [18,19]. The approaches differ in that QCDCL does not reason about func- 
tions, but only about values of variables. Fazekas et al. have formalized QCDCL 
as inference rules [16]. 

2QBF solvers based on CEGAR/CEGIS search for universal assignments 
and matching existential assignments using two SAT solvers [5,20, 21]. There are 
several generalizations of this approach to QBF with more than one quantifier 
alternation [22-26]. 


2 Preliminaries 


Quantified Boolean formulas over a finite set of variables x € X with domain 
B = {0,1} are generated by the following grammar: 


g=O0|1|z|-—9|(Y)|eVelyAy|az.y | Vz.9 
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We consider all other logical operations, including implication, XOR, and 
equality as syntactic sugar with the usual definitions. We abbreviate multiple 
quantifications Q21.Qzr2....Qan.y using the same quantifier Q € {V,4} by the 
quantification over the set of variables X = {x1,...,%n}, denoted as QX.y. 

An assignment x to a set of variables X is a function æ : X — B that maps 
each variable x € X to either 1 or 0. Given a propositional formula y over 
variables X and an assignment x’ to X’ C X, we define y(x’) to be the formula 
obtained by replacing the variables X’ by their truth value in x’. By y(x', £”) 
we denote the replacement by multiple assignments for disjoint sets X’, X” C X. 

A quantifier Q z.p for Q € {A,V} binds the variable x in its subformula 
p and we assume w.l.o.g. that every variable is bound at most once in any 
formula. A closed QBF is a formula in which all variables are bound. We define 
the dependency set of an existentially quantified variable y in a formula y as the 
set dep(y) of universally quantified variables x such that y’s subformula Jy. w is 
a subformula of y’s subformula Vz.y'. A Skolem function fy maps assignments 
to dep(y) to a truth value. We define the truth of a QBF y as the existence of 
Skolem functions fy = {fy,,---, fyn} for the existentially quantified variables 
Y = {y1,---,Yn}, such that y(a, fy(x)) holds for every æ, where fy(a) is the 
assignment to Y that the Skolem functions fy provide for æ. 

A formula is in prenex normal form, if the formula is closed and starts with 
a sequence of quantifiers followed by a propositional subformula. A formula ¢ is 
in the KQBF fragment for k € NF if it is closed, in prenex normal form, and has 
exactly k — 1 alternations between 4 and V quantifiers. 

A literal l is either a variable x € X, or its negation ~x. Given a set of 
literals {11,...,ln}, their disjunction (lı V ... V ln) is called a clause and their 
conjunction (l1 A... A ln) is called a cube. We use I to denote the literal that is 
the logical negation of l. We denote the variable of a literal by var(l) and lift 
the notion to clauses var(lı V -+ V ln) = {var(l),..., var(In)}- 

A propositional formula is in conjunctive normal form (CNF), if it is a con- 
junction of clauses. A prenex QBF is in prenex conjunctive normal form (PCNF) 
if its propositional subformula is in CNF. Every QBF vy can be transformed into 
an equivalent PCNF with size O(|y]|) [27]. 


Resolution is a well-known proof rule that allows us to merge two clauses as 
follows. Given two clauses C1 V v and C2 V nv, we call Cy Qy C2 = Cy V Co their 
resolvent with pivot v. The resolution rule states that C1 V v and Cs V ~w imply 
their resolvent. Resolution is refutationally complete for propositional Boolean 
formulas, i.e. for every propositional Boolean formula that is equivalent to false 
we can derive the empty clause. 

For quantified Boolean formulas, however, we need additional proof rules. 
The two most prominent ways to make resolution complete for QBF are to add 
either universal reduction or universal expansion, leading to the proof systems 
Q-resolution [28] and VExp-Res [10,29], respectively. 


Universal expansion eliminates a single universal variable by creating two copies 
of the subformulas of its quantifier. Let Q).Vxz.Q2. p be a QBF in PCNF, where 
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Qı and Q2 each are a sequence of quantifiers, and let Q2 quantify over variables 
X. Universal expansion yields the equivalent formula Q1.Q2.Q5. y[1/x, X'/X] A 
y|0/z], where Q is a copy of Q2 but quantifying over a fresh set of variables X’ 
instead of X. The term y[1/x, X’/X] denotes the p where x is replaced by 1 
and the variables X are replaced by their counterparts in X’. 


Universal reduction allows us to drop universal variables from clauses when none 
of the existential variables in that clause may depend on them. Let C a clause 
of a QBF and let l be a literal of a universally quantified variable in C. Let us 
further assume that / does not occur in C. If all existential variables v in C we 
have var(l) ¢ dep(v), universal reduction allows us to remove | from C. The 
resulting formula is equivalent to the original formula. 


Stack. For convenience, we use a stack data structure to describe the algorithm. 
Formally, a stack is a finite sequence. Given a stack S, we use S(i) to denote 
the i-th element of the stack, starting with index 0, and we use S.S” to denote 
concatenation. We use S{0, i] to denote the prefix up to element i of S. All stacks 
we consider are stacks of sets. In a slight abuse of notation, we also use stacks as 
the union of their elements when it is clear from the context. We also introduce 
an operation specific to stacks of sets S: We define add(S,i, x) to be the stack 
that results from extending the set on level i by element zx. 


2.1 Unique Skolem Functions 


Incremental determinization builds on the notion of unique Skolem functions. 
Let YX.JY. y be a 2QBF in PCNF and let x be a formula over X characterizing 
the domain of the Skolem functions we are currently interested in. We say that a 
variable v € Y has a unique Skolem function for domain x, if for each assignment 
x with x(x) there is a unique assignment v to v such that y(x, v) is satisfiable. 
In particular, a unique Skolem function is a Skolem function: 


Lemma 1. [f all existential variables have a unique Skolem function for the full 
domain x =1, the formula is true. 


The semantic characterization of unique Skolem functions above does not 
help us with the computation of Skolem functions directly. We now introduce a 
local approximation of unique Skolem functions and show how it can be used as 
a propagation procedure. 

We consider a set of variables D C X UY with D D X and focus on the 
subset y|p of clauses that only contain variables in D. We further assume that 
the existential variables in D already have unique Skolem functions for x in the 
formula y|p. We now define how to extend D by an existential variable v ¢ D. 
To define a Skolem function for v we only consider the clauses with unique 
consequence v, denoted U,, that contain a literal of v and otherwise only literals 
of variables in D. (Note that |n UU, = ọ|pu{v}). We define that variable v 
has a unique Skolem function relative to D for x, if for all assignments to D 
satisfying x and y there is a unique assignment to v satisfying Uy. 
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In order to determine unique Skolem functions relative to a set D in practice, 
we split the definition into the two statements deterministic and unconflicted. 
Each statement can be checked by a SAT solver and together they imply that 
variable v has a unique Skolem function relative to D. 

Given a clause C with unique consequence v, let us call =(C \ {v,7v}) the 
antecedent of C. Further, let Ai = Voeu, rec “(C \ {v, 7v}) be the disjunction 
of antecedents for the unique consequences containing the literal l of v. It is clear 
that whenever A, is satisfied, v needs to be true, and whenever A~» is satisfied, 
v need to be false. We define: 


deterministic(v, p, X, D) := VD. yplpAx => Ay V Aw 
unconflicted(v, y, Xx, D) := YD. y|lpAx > 7A( As ^A) 


deterministic states that v needs to be assigned either true or false for every 
assignment to D in the domain y that is consistent with the existing Skolem 
function definitions y|p. Accordingly, unconflicted states that v does not have to 
be true and false at the same time (which would indicate a conflict) for any such 
assignment. Unique Skolem functions relative to a set D approximate unique 
Skolem functions as follows: 


Lemma 2. Let the existential variables in D have unique Skolem functions 
for domain x and let v € Y have a unique Skolem function relative to D for 
domain x. Then v has a unique Skolem function for domain x. 


3 Inference Rules for Incremental Determinization 


In this section, we develop a nondeterministic algorithm that formalizes and 
generalizes ID. We describe the algorithm in terms of inference rules that specify 
how the state of the algorithm can be developed. The state of the algorithm 
consists of the following elements: 


— The solver status S € {Ready, Conflict(L, £), SAT, UNSAT}. The conflict sta- 
tus has two parameters: a clause L that is used to compute the learnt clause 
and the assignment x to the universals witnessing the conflict. 

— A stack C of sets of clauses. C (0) contains the original and the learnt clauses. 
C(t) for i > 0 contain temporary clauses introduced by decisions. 

— A stack D of sets of variables. The union of all levels in the stack represent the 
set of variables that currently have unique Skolem functions and the clauses 
in C|p represent these Skolem functions. D(0) contains the universals and 
the existentials whose Skolem functions do not depend on decisions. 

— A formula y over D(0) characterizing the set of assignments to the universals 
for which we still need to find a Skolem function. 

— A formula a over variables D(0) representing a temporary restriction on the 
domain of the Skolem functions. 


We assume that we are given a 2QBF in PCNF VX.3Y. ọ and that all clauses 
in ọ contain an existential variable. (If p contains a non-tautological clause 
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(Ready, C, D, x, a) vD 
deterministic(v, C, x A a, D) unconflicted(v, C, x A a, D) 


PROPAGATE 
(Ready, C: add(D, |D| a? 1, v), xX a) 


(Ready, C, D, x, a) véD all c € 6 have unique consequence v 


DECIDE 
(Ready, C.6, D., x, a) 


(Ready, C, D,x,1) D=XUY 


SAT 
(SAT, C, D, x, 1) 


Fig. 1. Inference rules needed to prove true QBF 


without existential variables, the formula is trivially false by universal reduction.) 
We define (Ready, y, X, 1,1) to be the initial state of the algorithm. That is, the 
clause stack C initially has height 1 and contains the clauses of the formula g. 
We initialize D as the stack of height 1 containing the universals. 

Before we dive into the inference rules, we want to point out that some of the 
rules in this calculus are not computable in polynomial time. The judgements 
deterministic and unconflicted require us to solve a SAT problem and are, in 
general, NP-complete. This is still easier than the 2QBF problem itself (unless 
NP includes JT?) and in practice they can be discharged quickly by SAT solvers. 


3.1 True QBF 


We continue with describing the basic version of ID, consisting of the rules in 
Figs. 1 and 2, and first focus on the rules in Fig. 1, which suffice to prove true 
2QBFs. PROPAGATE allows us to add a variable to D, if it has a unique Skolem 
function relative to D. (The notation add(D,|D|—1,v) means that we add v to 
the last level of the stack.) The judgements deterministic and unconflicted involve 
the current set of clauses C (i.e. the union of all sets of clauses in the sequence 
C). These checks are restricted to the domain x A a. Both x and a are true 
throughout this section; we discuss their use in Sects. 4 and 5. 


Invariant 1. All existential variables in D have a unique Skolem function for 
the domain x A a in the formula VX.AY. C|p, where C|p are the clauses in C 
that contain only variables in D. 


If PROPAGATE identifies all variables to have unique Skolem functions relative 
to the growing set D, we know that they also have unique Skolem functions 
(Lemma 2). We can then apply SAT to reach the SAT state, representing that 
the formula has been proven true (Lemma 1). 


Lemma 3. ID cannot reach the SAT state for false QBF. 
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Proof. Let us assume we reached the SAT state for a false 2QBF and prove the 
statement by way of contradiction. The SAT state can only be reached by the 
rule SAT and requires D = X UY. By Invariant 1 all variables have a Skolem 
function in VX.4Y. C. Since C includes y, this Skolem function does not violate 
any clause in y, which means it is indeed a proof. 


When PROPAGATE is unable to determine the existence of a unique Skolem 
function (i.e. for variables where the judgement deterministic does not hold) we 
can use the rule DECIDE to introduce additional clauses such that deterministic 
holds and propagation can continue. Note that additional clauses make it easier 
to satisfy deterministic and adding the clause v (i.e. a unit clause) even ensures 
that deterministic holds for v. 

Assuming we consider a true 2QBF, we can pick a Skolem function fy for each 
existential variable y and encode it using DECIDE. We can simply consider the 
truth table of fy in terms of the universal variables and define 6 to be the set of 
clauses {7g Vv | fy(a)}U{-aV-v | af,(a)}. (Here we interpret the assignment 
x as a conjunction of literals.) These clauses have unique consequence v and they 
guarantee that v is deterministic. Further, they guarantee that v is unconflicted, 
as otherwise fy would not be a Skolem function, so we can apply PROPAGATE 
to add v to D. Repeating this process for every variable let us reach the point 
where Y C D and we can apply SAT to reach the SAT state. 


Lemma 4. ID can reach the SAT state for true QBF. 


Note that proving the truth of a QBF in this way requires guessing correct 
Skolem functions for all existentials. In Subsect.3.4 we discuss how termination 
is guaranteed with a simpler type of decisions. 


(Ready, C, D, x, a) x refutes unconflicted(v, C, x A a, D) 


c ` 
ONFLICT (Conflict({v, av}, x), C, D, X a) 


(Conflict(L, £), C, D, x, a) ce C(0) leL lee 
(Conflict(L @yar(1) €, Œ), C, D, x, @) 


ANALYZE 


(Conflict(L, x), C, D, x, a) var(L) Z D 
N 


LEAR 
(Ready, add (C, 0, L), D, Xs a) 


oe (Conflict(L, x), C, D, x, a) var(L) C D(0) rL 
(UNSAT, C, D, x,a) 


(S,C,D,x, a) 0 < dlul < |C] 


BACKTRACK 
(S, C[0, dlvl], D[0, dlul], x, œ) 


Fig. 2. Additional inference rules needed to disprove false QBF 
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3.2 False QBF 


To disprove false 2QBFs, i.e. formulas that do not have a Skolem function, we 
need the rules in Fig.2 in addition to PROPAGATE and DECIDE from Fig. 1. 
The conflict state can only be reached via the rule CONFLICT, which requires 
that a variable v is conflicted, i.e. unconflicted fails. The CONFLICT rule stores 
the assignment x to D that proves the conflict and it creates the nucleus of 
the learnt clause {v, ~v}. Via ANALYZE we can then resolve that nucleus with 
clauses in C'(0), which consists of the original clauses and the clauses learnt so 
far. We are allowed to add the learnt clause back to C(0) by applying LEARN. 


Invariant 2. C(0) is equivalent to y. 


Note that C(0) and ọ are propositional formulas over X U Y. Their equiv- 
alence means that they have the same set of satisfying assignments. We prove 
Invariant 2 together with the next invariant. 


Invariant 3. Clause L in conflict state Conflict(L, x) is implied by g. 


Proof. C(0) contains ọ initially and is only ever changed by adding clauses 
through the LEARN rule, so C(0) = y holds throughout the computation. 

We prove the other direction of Invariants 2 and 3 by mutual induction. Ini- 
tially, C(0) consists exactly of the clauses y, satisfying Invariant 2. The nucleus 
of the learnt clause v V ~w is trivially true, so it is implied by any formula, which 
gives us the base case of Invariant 3. ANALYZE is the only rule modifying L, 
and hence soundness of resolution together with Invariant 2 already gives us the 
induction step for Invariant 3 [30]. Since LEARN is the only rule changing C(0), 
Invariant 3 implies the induction step of Invariant 2. 


When adding the learnt clause to C(0) we have to make sure that Invariant 1 
is preserved. LEARN hence requires that we have backtracked far enough with 
BACKTRACK, such that at least one of the variables in L is not in D anymore. 
In this way, L may become part of future Skolem function definitions, but will 
first have to be checked for causing conflicts by PROPAGATE. 

If all variables in L are in D(0) and the assignment æ from the conflict violates 
L, we can conclude the formula to be false using UNSAT. The soundness of this 
step follows from the fact that æ includes an assignment satisfying C(0)|p,o) 
(i.e. the clauses defining the Skolem functions for D(0)), Invariants 1 and 3. 


Lemma 5. ID cannot reach the UNSAT state for true QBF. 


We will now show that we can disprove any false QBF. The main difficulty 
in this proof is to show that from any Ready state we can learn a new clause, i.e. 
a clause that is semantically different to any clause in C(0), and then return to 
the Ready state. Since there are only finitely many semantically different clauses 
over variables X UY, and we cannot terminate in any other way (Lemma 5), we 
eventually have to find a clause L with var(L) C D(0), which enables us to go 
to the UNSAT state. 
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From the Ready state, we can always add more variables to D with DECIDE 
and PROPAGATE, until we reach a conflict. (Otherwise we would reach a state 
where D = Y we were able to prove SAT, contradicting Lemma 5.) We only enter 
a Conflict state for a variable v, if there are two clauses (cı Vv) and (c2V7v) with 
unique consequence v such that æ H 7c; A 7c2 (see definition of unconflicted). 

In order to apply ANALYZE, we need to make sure that (cy V v) and (c2 V ~w) 
are in C(0). We can guarantee this by restricting DECIDE as follows: We say 
a decision for a variable v’ is consistent with the unique consequences in state 
(Ready, C, D, x, œ), if unconflicted(v, C.d, x \a, D). We can construct such a deci- 
sion easily by applying DECIDE only on variables that are not conflicted already 
(i.e. unconflicted(v, C, x \a, D)) and by defining ô to be the CNF representation 
of =A, = v (i.e. require v to be false, unless a unique consequence containing 
literal v applies). It is clear that for this 6 no new conflict for v is introduced 
and hence unconflicted(v, C.d, x A a, D). 

Assuming that all decisions are taken consistent with the unique conse- 
quences, we know that when we encounter a conflict for variable v, we did not 
apply DECIDE for v, and hence the clauses (cy Vv) and (c2 V ~w) causing the con- 
flict must be in C(0). We can hence apply ANALYZE twice with clauses (c1 V v) 
and (cə V ~w) and obtain the learnt clause L = c1 V cg. Since  F ~ac A 709, 
the learnt clause is violated by æ. As æ refutes unconflicted(v,C,. A a, D) by 
construction, it must satisfy the clauses C|p and learnt clause L hence cannot 
be in C|p. Further, L only contains variables that are in D, as (cı V v) and 
(c2 V ~w) were clauses with unique consequence v. So, L would have been in 
C|p, if it existed in C already, and hence L is new. We can either add the new 
clause to C(0) after backtracking, or we can conclude UNSAT. 


Lemma 6. ID can reach the UNSAT state for false QBF. 


The clause learning process considered here only applies one actual resolu- 
tion step per conflict (Lı Q» L2). In practice, we probably want to apply multi- 
ple resolution steps before applying LEARN. It is possible to use the conflicting 
assignment g to (implicitly) construct an implication graph and mimic the clause 
learning of SAT solvers [8,31]. 


3.3 Example 


We now discuss the application of the inference rules along the following formula: 


Vay, £2. dy1,---, ya. (£1 V7y1) A (z2 Vay) A (701 V az V y1) A (1) 
(mza V y2) A (“y1 V y2) A (z2 V y1 V =y2) A (2) 
(yı V =y3) A (y2 V =y3) A (3) 
(=y1 V ya) A (“y3 V sya) (4) 


Initially, the state of the algorithm is the tuple (Ready, y, X,1, 1). The rule 
PROPAGATE can be applied to yı in the initial state, as we are in the Ready 
state, y1 ¢ X, and because yı satisfies the checks deterministic and unconflicted: 
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The antecedents of yı are Ay, = £1 A x2 and Ay, = 721 V 7% (see clauses 
in line (1)). It is easy to check that both Ay, V A.,, nor (Ay, A Asy,) hold 
for all assignments to xı and x9. The state resulting from the application of 
PROPAGATE is (Ready, y, X U {y1}, 1, 1). (Alternatively, we could apply DECIDE 
in the initial state, but deriving unique Skolem functions is generally preferable.) 

While PROPAGATE was not applicable to y2 before, it now is, as the increased 
set D made yp deterministic (see clauses in line (2)). We can thus derive the state 
(Ready, P, XU {y1, yo}, 1, 1). 

Now, we ran out of variables to propagate and the only applicable rule 
is DECIDE. We arbitrarily choose y3 as our decision variable and arbitrar- 
ily introduce a single clause 6 = {(~yı V 7y2 V y3)}, arriving in the state 
(Ready, y., XU{y1, y2}, 1,1). In this step we can immediately apply PROPAGATE 
(consider ô and the clauses in line (3)) to add the decision variable to the set D 
and arrive at (Ready, y.6, X U {y1, Y2, y3},1,1). 

We can now apply BACKTRACK to undo the last decision, but this would not 
be productive. Instead identify y4 to be conflicted and we enter a conflict state 
with CONFLICT: (Conflict({y4, 7y4}, v1Av2), 9.8, XU{y1, Y2, Y3}, 1,1). To resolve 
the conflict we apply ANALYZE twice - once with each of the clauses in line (4) 
- bringing us into state (Conflict({>y1, >y3}, 21 A v2), y.0, X U {y1, ye, Y3}, 1,1). 
We can backtrack one level such that D = X U {y1, y2} and then apply LEARN 
to enter state (Ready, y U {(>y1 V 7y3)}, X U {y1, yo}, 1,1). 

The rest is simple: we apply PROPAGATE on y3 and take a decision for y4. As 
no other variable can depend on y4 we can take an arbitrary decision for y4 that 
makes y4 deterministic, as long as this does not make y4 conflicted. Finally, we 
can propagate y4 and then apply SAT to conclude that we have found Skolem 
functions for all existential variables. 


3.4 Termination 


So far, we have described sound and nondeterministic algorithms that allow us 
to prove or disprove any 2QBF. We can easily turn the algorithm in the proof 
of Lemma 6 into a deterministic algorithm that terminates for both true and 
false QBF by introducing an arbitrary ordering of variables and assignments: 
Whenever there is nondeterminism in the application of one of the rules as 
described in Lemma 6, pick the smallest variable for which one of the rules is 
applicable. When multiple rules are applicable for that variable, pick them in 
the order they appear in the figures. When the inference rule allows multiple 
assignments, pick the smallest. In particular, this guarantees that the existential 
variables are added to D in the arbitrarily picked order, as for any existential 
not in D we can either apply PROPAGATE, DECIDE, or CONFLICT. 

Restricting DECIDE to decisions that are consistent with the unique con- 
sequences may be unintuitive for true QBF, where we try to find a Skolem 
function. However, whenever we make the 2QBF false by introducing clauses 
with DECIDE, we will eventually go to a conflict state and learn a new clause. 
Deriving the learnt clause for conflicted variable v from two clauses with unique 
consequence v (as described for Lemma 6) means that we push the constraints 
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SAT JY. ọ |2QBF VX. AY. 
State Partial assignment of values |Partial assignment of functions 
Propagation|unit propagation unique Skolem function w.r.t. D 
Decision unit clause clause with unique consequence 
Conflict unit clauses y and =y AX that implies y and ~y 
Learning clause clause 


Fig. 3. Concepts in ID and their counterparts in CDCL 


towards smaller variables in the variable ordering. The learnt clause will thus 
improve the Skolem function for a smaller variable or cause another conflict for 
a smaller variable. In the extreme case, we will eventually learn clauses that look 
like function table entries, as used in Lemma 4, i.e. clauses containing exactly 
one existential variable. At some point, even with our restriction for DECIDE, we 
cannot make a “wrong” decision: The cases for which a variable does not have 
a clause with unique consequence are either irrelevant for the satisfaction of the 
2QBF or our restricted decisions happen to make the right assignment. 

In cases where no static ordering of variables is used - as it will be the case in 
any practical approach - the termination for true QBF is less obvious but follows 
the same argument: Given enough learnt clauses, the relationships between the 
variables are dense enough such that even naive decisions suffice. 


3.5 Pure Literals 


The original paper on ID introduces the notion of pure literals for QBF that 
allows us to propagate a variable v even if it is not deterministic, if for a literal l 
of v, all clauses c that l occurs in are either satisfied or l is the unique consequence 
of c. The formalization presented in this section allows us to conclude that pure 
literals are a special case of DECIDE: We can introduce clauses defining v to be 
of polarity | whenever all clauses containing l are satisfied by another literal. 

That is, we can precisely characterize the minimal set of cases in which v has 
to be of polarity l and the decision is guaranteed to never introduce unnecessary 
conflicts. The same definition cannot be made when / occurs in clauses where it 
is not a unique consequence, as then the clause contains another variable that 
is not deterministic yet. 


3.6 Relation of ID and CDCL 


There are some obvious similarities between ID and conflict-driven clause learn- 
ing (CDCL) for SAT. Both algorithms modify their partial assignments by propa- 
gation, decisions, clause learning, and backtracking. The main difference between 
the algorithms is that, while CDCL solvers maintain a partial assignment of 
Boolean values to variables, ID maintains a partial assignment of functions to 
variables (which is represented by the clauses C|p). We summarized our obser- 
vations in Fig. 3. 
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INDUCTIVEREFINEMENT 
(Conflict(L, x), C,D,x A =¢(y), a) 


(Conflict(L, æ), C, D, x,a) p(a|x) is unsatisfiable 
ED 


FAIL 
(UNSAT, C, D, x, a) 


Fig. 4. Inference rules adding inductive reasoning to ID 


4 Inductive Reasoning 


The CEGIS approach to solving a 2QBF VX. JY. ọ is to iterate over X assign- 
ments æ and check if there is an assignment y such that (æ, y) is valid. Upon 
every successful iteration we exclude all assignments to X for which y is a 
matching assignment. If the space of X assignments is exhausted we conclude 
the formula is true, and if we find an assignment to X for which there is no 
matching Y assignment, the formula is false [21]. 

While this approach shows poor performance on some problems, as discussed 
in the introduction, it is widely popular and has been successfully applied in 
many cases. In this section we present a way how it can be integrated in ID 
in an elegant way. The simplicity of the CEGIS approach carries over to our 
extension of ID - we only need the two additional inference rules in Fig. 4. 

We exploit the fact that ID already generates assignments x to X in its 
conflict check. Whenever ID is in a conflict state, the rules in Fig. 4 allow us to 
check if there is an assignment y to Y that together with |x, which is the part 
of x defining variables in X, satisfies y. If there is such an assignment y, we can 
let the Skolem functions output y for the input æ. But the output y may work 
for other assignments to X, too. The set of all assignments to X for which y 
works as an output, is easily characterized by y(y).1 INDUCTIVEREFINEMENT 
allows us to exclude the assignments y(0) from x, which represents the domain 
(i.e. assignments to X) for which we still need to find a Skolem function. 

This gives rise to a new invariant, stating that ~y only includes assignments 
to X for which we know that there is an assignment to Y satisfying y. With this 
invariant it is clear that Lemma 3 also holds for arbitrary x. 


Invariant 4. YX.JY. =v > p 


It is easy to check that PROPAGATE preserves Invariant 1 also if x and a are 
not 1. Invariants 2 and 3 are unaffected by the rules in this section. To make 
sure that Lemma 5 is preserved as well, we thus only have to inspect FAILED, 
which is trivially sound. 


1 We can actually exploit the Skolem functions that do not depend on decisions and 
exclude C(0) (ypy) from x instead, i.e. the set of assignments to D(0) to which the 
part of y that is not in D(0) is a solution. 
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(Ready, C, D, x, a) var(l) € D(0) 


ASSUME 
(Ready, C, D,x,a A L) 


(Ready, C, D, x, a) D=XUY 


CLos 
HOSE — Ready, O(0), D(), x A =a, 1) 


Fig. 5. Inference rules adding case distinctions to ID 


A Portfolio Approach? In principle, we could generate assignments «x inde- 
pendently from the conflict check of ID. The result would be a portfolio app- 
roach that simply executes ID and CEGIS in parallel and takes the result from 
whichever method terminates first. The idea behind our extension is that con- 
flict assignments are more selective and may thus increase the probability that 
we hit a refuting assignment to X. Also ID may profit from excluding groups 
of assignments for which frequently cause conflicts. We revisit this question in 
Sect. 6. 


Example. We extend the example from Subsect.3.3 from the point where we 
entered the conflict state (Conflict({y4, sya}, 71 A £2), 9.6, X U {y1, Y2; Y3}, 1,1). 
We can apply INDUCTIVEREFINEMENT, checking that there is indeed a solution 
to ọ for the assignment 21, x2 to the universals (e.g. y1, y2, 7y3, Y4). Instead of 
doing the standard conflict analysis as in our previous example, we can apply 
LEARN to add the (useless) clause y4 V ~y4 to C(0) without any backtracking. 
That is, we effectively ignore the conflict and go to state (Ready, p U {(y4 V 
aya) }-d, X U {y1, yo, Y3}, 7&1 V 722, 1). 

There is no assignment to X that provokes a conflict for y4, other than the 
one we excluded through INDUCTIVEREFINEMENT. We can thus take an arbitrary 
decision for y4 that is consistent with the unique consequences (see Subsect. 3.2), 
PROPAGATE y4, and then conclude the formula to be true. 


5 Expansion 


Universal expansion (defined in Sect. 2) is another fundamental proof rule that 
deals with universal variables. It has been used in early QBF solvers [10] and 
has later been integrated in CEGAR-style QBF solvers [26,32]. 

One way to look at the expansion of a universal variable x is that it introduces 
a case distinction over the possible values of x in the Skolem functions. However, 
instead of creating a copy of the formula explicitly, which often caused a blowup 
in required memory, we can reason about the two cases sequentially. The rules 
in Fig. 5 extend ID by universal expansion in this spirit. 

Using ASSUME we can, at any point, assume that a variable v in D(0), i.e. 
a variable that has a unique Skolem function without any decisions, has a par- 
ticular value. This is represented by extending a by the corresponding literal 
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of v, which restricts the domain of the Skolem function that we try to construct 
for subsequent deterministic and unconflicted checks. Invariant 1 and Lemma 5 
already accommodate the case that a is not 1. 

When we reach a point where D contains all variables, we cannot apply SAT, 
as that requires a to be true. In this case, Invariant 1 only guarantees us that the 
function we constructed is correct on the domain x A a. We can hence restrict 
the domain for which we still need to find a Skolem function and strengthen 
x by ~a. In particular, CLOSE maintains Invariant 4. When x ends up being 
equivalent to 0, Invariant 4 guarantees that the original formula is true. (In this 
case we can reach the SAT state easily, as we know that from now on every 
application of PROPAGATE must succeed.”) 

Note that ASSUME does not restrict us to assumptions on single variables. 
Together with DECIDE and PROPAGATE it is possible to introduce variables with 
arbitrary definitions, add them to D(0), and then assume an outcome with the 
rule ASSUME. 


Example. Again, we consider the formula from Subsect.3.3. Instead of the rea- 
soning steps described in Subsect.3.3, we start using ASSUME with literal x2. 
Whenever checking deterministic or unconflicted in the following, we will thus 
restrict ourselves to universal assignments that set x2 to true. It is easy to check 
that this allows us to propagate not only yı and y2, but also y3. A decision (e.g. 
6’ = {(ya)}) for y4 allows us to also propagate y4 (this time without potential 
for conflicts), arriving in state (Ready, v.06’, X U {y1, Y2, Y3, Y4}, L, x2). 

We can CLOSE this case concluding that under the assumption x2 we have 
found a Skolem function. We enter the state (Ready, y, X, 722, 1) which indicates 
that in the future we only have to consider universal assignments with 722. Also 
for the case ~z, we cannot encounter conflicts for this formula. Expansion hence 
allows us to prove this formula without any conflicts. 


6 Experimental Evaluation 


We extended the QBF solver CADET [8] by the extensions described in Sects. 4 
and 5. We use the CADET-IR and CADET-E to denote the extensions of 
CADET by inductive reasoning (Sect.4) and universal expansion (Sect.5), 
respectively. We also combined both extensions and refer to this version as 
CADET-IR-E. The experiments in this section evaluate these extensions against 
the basic version of CADET and against other successful QBF solvers of the 
recent years, in particular GhostQ [33], RAReQS [32], Qesto [23], DepQBF [19] 
in version 6, and CAQE [24,26]. For every solver except CADET and GhostQ, 
we use Bloqger [34] in version 031 as preprocessor. For our experiments, we used 
a machine with a 3.6 GHz quad-core Intel Xeon processor and 32GB of mem- 
ory. The timeout and memout were set to 600s and 8GB. We evaluated the 


2 Technically, we could replace SAT by a rule that allows us to enter the SAT state 
whenever y is 0, which arguably would be more elegant. But that would require us 
to introduce the CLOSE rule already for the basic ID inference system. 
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Fig. 6. Cactus plot comparing solvers on the QBFEval-2017 2QBF benchmark. 


solvers on the benchmark sets of the last competitive evaluation of QBF solvers, 
QBFEval-2017 [9]. 


How Does Inductive Reasoning Affect the Performance? In Fig.6 we see that 
CADET-IR clearly dominates plain CADET. It also dominates all solvers that 
relied on clause-level CEGAR and Bloqger (CAQE, Qesto, RAReQS). 

Only GhostQ beats CADET-IR and solves 5 more formulas (of 384). A closer 
look revealed that there are many formulas for which CADET-IR and GhostQ 
show widely different runtimes hinting at potential for future improvement. 

GhostQ is based on the CEGAR principle, but reconstructs a circuit rep- 
resentation from the clauses instead of operating on the clauses directly [33]. 
This makes GhostQ a representative of QBF solvers working with so called 
“structured” formulas (i.e. not CNF). CADET, on the other hand, refrains from 
identifying logic gates in CNF formulas and directly operates with the “unstruc- 
tured” CNF representation. In the ongoing debate in the QBF community on the 
best representation of formulas for solving quantified formulas, our experimental 
findings can thus be interpreted as a tie between the two philosophies. 


Is the Inductive Reasoning Extension Just a Portfolio-Approach? To settle this 
question, we created a version of CADET-IR, called IR-only, that exclusively 
applies inductive reasoning by generating assignments to the universals and 
applying INDUCTIVEREASONING. This version of CADET does not learn any 
clauses, but otherwise uses the same code as CADET-IR. On the QBFEval-2017 
benchmark, IR-only and CADET together solved 235 problems within the time 
limit, while CADET-IR solved 243 problems. That is, even though the com- 
bined runtime of CADET and IR-only was twice the runtime of CADET-IR, 
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Fig. 7. Cactus plot comparing solver performance on the Hardware Fixpoint formulas. 
Some but not all of these formulas are part of QBFEval-2017. The formulas encode 
diameter problems that are known to be hard for classical QBF search algorithms [35]. 


they solved fewer problems. CADET-IR also uniquely solved 22 problems. This 
indicates that CADET-IR improves over the portfolio approach. 


How Does Universal Expansion Affect the Performance? CADET-E clearly dom- 
inates plain CADET on QBFEval-2017, but compared to CADET-IR and some 
of the other QBF solvers, CADET-E shows mediocre performance overall. How- 
ever, for some subsets of formulas, such as the Hardware Fixpoint formulas shown 
in Fig. 7, CADET-E dominated CADET, CADET-IR, and all other solvers. We 
also combined the two extensions of CADET to obtain CADET-IR-E. While this 
helped to improve the performance on the Hardware Fixpoint formulas even fur- 
ther, it did not change the overall picture on QBFEval-2017. 


7 Conclusion 


Reasoning in quantified logics is one of the major challenges in computer-aided 
verification. Incremental Determinization (ID) introduced a new algorithmic 
principle for reasoning in 2QBF and delivered first promising results [8]. In this 
work, we formalized and generalized ID to improve the understanding of the 
algorithm and to enable future research on the topic. The presentation of the 
algorithm as a set of inference rules has allowed us to disentangle the design 
choices from the principles of the algorithm (Sect. 3). Additionally, we have 
explored two extensions of ID that both significantly improve the performance: 
The first one integrates the popular CEGAR-style algorithms and Incremental 
Determinization (Sect. 4). The second extension integrates a different type of 
reasoning termed universal expansion (Sect. 5). 
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Abstract. The resolution proof system has been enormously helpful in 
deepening our understanding of conflict-driven clause-learning (CDCL) 
SAT solvers. In the interest of providing a similar proof complexity- 
theoretic analysis of satisfiability modulo theories (SMT) solvers, we 
introduce a generalization of resolution called Res(T). We show that 
many of the known results comparing resolution and CDCL solvers lift 
to the SMT setting, such as the result of Pipatsrisawat and Darwiche 
showing that CDCL solvers with “perfect” non-deterministic branching 
and an asserting clause-learning scheme can polynomially simulate gen- 
eral resolution. We also describe a stronger version of Res(T), Res*(T), 
capturing SMT solvers allowing introduction of new literals. We analyze 
the theory EUF of equality with uninterpreted functions, and show that 
the Res*(EUF) system is able to simulate an earlier calculus introduced 
by Bjgrner and de Moura for the purpose of analyzing DPLL(EUF). Fur- 
ther, we show that Res*(EUF) (and thus SMT algorithms with clause 
learning over EUF, new literal introduction rules and perfect branching) 
can simulate the Frege proof system, which is well-known to be far more 
powerful than resolution. Finally, we prove under the Exponential Time 
Hypothesis (ETH) that any reduction from EUF to SAT (such as the 
Ackermann reduction) must, in the worst case, produce an instance of 
size 2(nlogn) from an instance of size n. 


1 Introduction 


It is common practice in formal verification literature to view SAT/SMT solver 
algorithms as proof systems and study their properties, such as soundness, com- 
pleteness and termination, using proof-theoretic tools [GHN+04, ORCO9, Tin12]. 
However, much work remains in applying the powerful lens of proof complexity 
theory in understanding the relative power of these solvers. All too often, the 
power of SAT and SMT (satisfiability modulo theories) solving algorithms is 
determined by how they perform at the annual SAT or SMTCOMP competi- 
tions [BHJ17,smt]. While such competitions are an extremely useful practical 
test of the power of solving methods, they do not address fundamental questions 
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such as which heuristics are truly responsible for the power of these solvers or 
what are the lower bounds for these methods when viewed as proof systems. 

Solvers, by their very nature, are a tangled jumble of heuristics that interact 
with each other in complicated ways. Many SMT solvers run into hundreds of 
thousands of lines of code, making them very hard to analyze. It is often difficult 
to discern which sets of heuristics are universally useful, which sets are tailored 
to a class of instances, and how their interactions actually help solver perfor- 
mance. A purely empirical approach, while necessary, is far from sufficient in 
deepening our understanding of solver algorithms. What is needed is an appro- 
priate combination of empirical and theoretical approaches to understanding the 
power of solvers. Fortunately, proof complexity theory provides a powerful lens 
through which to mathematically analyze solver algorithms as proof systems and 
to understand their relative power via lower bounds. The value of using proof 
complexity theory to better understand solving algorithms as proof systems is 
three-fold: first, it allows us to identify key ingredients of a solving algorithm and 
prove lower bounds to non-deterministic combinations of such ingredients. That 
is, we can analyze the countably many variants of a solving algorithm in a unified 
manner via a single analysis, rather than analyzing different configurations of 
the same set of proof-theoretic ingredients; second, proof complexity-theoretic 
tools allow us to recognize the relative power of two proof systems, via appro- 
priate lower bounds, even if both have worst-case exponential time complexity; 
finally, proof complexity theory brings with it a rich literature and connections 
to other sub-fields of complexity theory (e.g., circuit complexity) that we may be 
able to leverage in analyzing solver algorithms. Many proof complexity theorists 
and logicians have long recognized this, and there is rich literature on the anal- 
ysis of SAT solving algorithms such as DPLL and conflict-driven clause-learning 
(CDCL) solvers [PD11,BKS04,BBJ14,AFT11]. In this paper, we lift some of 
these results to the setting of SMT solvers, following the work of Bjgrner and de 
Moura [BM14]. 

Our focus is primarily the proof complexity-theoretic analysis of the 
“DPLL(T) method”!, the prime engine behind many modern SMT solvers 
[GHN+04, Tin12]. (While other approaches to solving first-order formulas have 
been studied, DPLL(T) remains a fundamental and dominant approach.) A 
DPLL(T)-based SMT solver takes as input a Boolean combination of first-order 
theory T atoms or their negation (aka, theory literals), and decides whether 
such an input is satisfiable. Informally, a typical DPLL(T)-based SMT solver S$ 


1 Prior to mid 2000’s, SAT researchers and complexity theorists confusingly used the 
term DPLL to refer to both the original algorithm proposed by Davis, Putnam, 
Loveland, and Loeggemann in 1960, as well as the newer algorithm by Joao Marques- 
Silva and Karem Sakallah that added clause learning to DPLL (proposed in 1996), 
even though they are vastly different in power as proof systems. We will follow the 
literature and use DPLL(T) to indicate a “modern” SMT solver with clause learning 
and restarts, but, we urge SMT solver researchers to use the more appropriate term 
CDCL(T) rather than DPLL(T) to refer to the lazy approach to SMT. 
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is essentially a CDCL Boolean SAT solver that calls out a theory solver T, dur- 
ing its search to perform theory propagations and theory conflict-clause learning. 
The typical theory solver T, is designed to accept only quantifier-free conjunc- 
tion of theory T literals (the T in the term DPLL(T)), while the SAT solver 
“handles” the Boolean structure of input formulas. Roughly speaking, the SMT 
solver S' works as follows: First, it constructs a Boolean abstraction Bp of the 
input formula F, by replacing theory literals by Boolean variables. If Br is 
UNSAT, S returns UNSAT. Otherwise, satisfying assignments to the Boolean 
abstraction Bp are found, which in turn correspond to conjunctions of theory 
literals. Such conjunctions are then input to the theory solver Ts, which may 
deduce new implied formulas (via theory propagation and conflict clause learn- 
ing) that are then used to help prune the search space of assignments to F. The 
solver S returns SAT upon finding a satisfying theory assignment to the input F, 
and UNSAT otherwise. (For further details, we refer the reader to the excellent 
exposition on this topic by Tinelli [Tin12].) 


A Brief Description of the Res(T) Proof System: To abstractly model a 
DPLL(T)-based SMT solver S, we define a proof system Res(T) below for a given 
first-order theory T. The Res in Res(T) refers to the general resolution proof 
system for Boolean logic. Without loss of generality, we assume that Res(T) 
accepts theory formulas in conjunctive normal form (CNF). Let F denote a 
CNF with propositional variables representing atoms from an underlying theory 
T, and for any clause C in FF let vars(F') denote the set of propositional atoms 
occurring in F. The proof rules of Res(T) augment the resolution proof rule as 
follows: A proof in Res(T) is a general resolution refutation of F, where at any 
step the theory T-solver can add to the set of clauses an arbitrary clause C 
such that T E C and every propositional atom in vars(C) occurs in the original 
formula. That is, each line of a Res(T) proof is deduced by one of the following 
rules: 


Resolution. C V £, D V £+ C V D, for previously derived clauses C and D. 
Theory Derivation. | C for any clause C such that T F C and for which every 
theory literal in C occurs in the input formula. 


For example, a theory of linear arithmetic may introduce a clause (x > 5Vy > 
TV x+y < 12), which can then be used in the subsequent steps of a resolution 
proof, provided each of those literals occurred in the original CNF formula F. 
The filter on the theory rule of Res(T) models the fact that in many modern 
SMT solvers, the “theory solver” is only allowed to reason about literals which 
already occur in the formula. Recent solvers such as Z3, Yices [Z3, Yic] break this 
rule and allow the theory solver to introduce new propositional atoms; to model 
this we introduce the stronger variant Res*(T) with a strengthened theory rule: 


Strong Theory Derivation: | C for any clause C such that T E C. 
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1.1 Our Contributions 


We prove the following results about the two systems Res(T), Res*(T) and the 
complexity of SMT solving. 


1. We show that DPLL(T) with an arbitrary asserting clause learning scheme 
and non-deterministic branching and theory propagation is equivalent (as a 
proof system) to Res(T) for any theory T. More precisely: if the theory solver 
in DPLL(T) can only reason about literals in the input, then it is equivalent 
to Res(T); if it can reason about arbitrary literals then it is equivalent to 
Res*(T’). (See Sect. 3) 

2. When the theory T is E, the theory of pure equalities, Res* (E) is equivalent 
to the SP(E) system of Bjørner et al. [BDdM08], which seems to have no 
efficient proofs of the PHP. (See Sect. 5.1) 

3. When the theory T is EUF (equality with uninterpreted function symbols), 
the proof system Res* (EUF) can simulate E-Res, a different generalization of 
resolution introduced by Bjorner and de Moura [BM14] for the purpose of sim- 
ulating standard implementations of DPLL(EUF). Furthermore, Res*(EUF) 
can simulate the powerful Frege proof system. (See Sect. 5.2) 

4. When T is LA, a theory of linear arithmetic over a set of numbers contain- 
ing integers, Res(LA) can polynomially simulate the system R(lin) of Raz and 
Tzameret [RT08], and thus has polynomial size proofs of several hard tautolo- 
gies such as the pigeonhole principle and Tseitin tautologies. (See Sect. 5.3) 

5. Finally, we prove under the Exponential Time Hypothesis (ETH) that any 
reduction from EUF to SAT (such as the Ackermann reduction) must, in the 
worst case, produce an instance of size 2(n logn) from an instance of size n. 
(See Sect. 6) 


These results seem to suggest that our generalization is the “right” proof sys- 
tem corresponding to DPLL(T), as it characterizes proofs produced by DPLL(T) 
and it can simulate other proof systems introduced in the literature to capture 
DPLL(T) for particular theories T. 


1.2 Previous Work 


Among the previous proof systems combining resolution with non-propositional 
reasoning are R(CP) proof system of [Kra98], where propositional variables are 
replaced with linear inequalities, and R(lin) introduced by Raz and Tzameret 
[RT08], which reasons with linear equalities, modifying the resolution rule. R(lin) 
polynomially simulates R(CP) when all coefficients in an R(CP) proof are polyno- 
mially bounded. In the SMT community, Bjørner et al. [BDdM08,BM14] intro- 
duced calculi capturing the power of resolution over the theory of equality and 
equality with uninterpreted functions. They show that these systems capture 
the power of resolution over the corresponding theories, extended with rules for 
introducing new atoms. Our results supersede previous work since our simula- 
tions hold for any first-order theory T. 
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2 Preliminaries 


2.1 Propositional Proof Systems 


In this paper, all proof systems are defined by a set of “allowed lines” equipped 
with a list of deduction rules that allow us to deduce new lines from old ones. We 
first recall the resolution system, which is a refutation system for propositional 
formulas in CNF (product of sums) form. The lines of a resolution proof are 
disjunctions of boolean literals called clauses, and these lines are equipped with 
a single deduction rule called the resolution rule: given two clauses of the form 
Cv £, Dv £ we deduce the clause C V D. If 6 = Cy A C2 ^A- A Cm is an 
unsatisfiable CNF formula then a resolution refutation of @ is a sequence of 
clauses C1, C2,...,Cm,Cm+1,-..,C, where C; is the empty clause and all clauses 
C; with i > m are deduced from earlier clauses by applying the resolution rule. 

Observe that clauses satisfy a subsumption principle: if C, D are clauses 
such that C C D then every assignment satisfying C also satisfies D. This 
implies that we can safely add a weakening rule to resolution which, from a 
clause C, derives the clause C V x for any literal x not already occurring in C. 
The subsumption principle implies that this weakening rule does not change the 
power of resolution, as any use of a clause D D C can be eliminated or replaced 
with C. 

We also consider the Frege proof system, which captures standard “textbook- 
style” proofs. The lines of a Frege system are given by arbitrary boolean formulas, 
and from two boolean formulas we can deduce any new boolean formula which 
follows under typical boolean reasoning (e.g. deducing the conjunction of two for- 
mulas, the disjunction of their negation, and so on). Crucially, Frege proofs allow 
applying a generalized “resolution rule” to arbitrary polynomial-size formulas. 

The power of different propositional proof systems are compared using the 
notion of an polynomial simulation (p-simulation). Proof system A polynomi- 
ally simulates (or p-simulates) proof system B if, for every unsatisfiable formula 
F, the shortest refutation proof of F in A is at most polynomially longer than 
the shortest refutation proof of a formula F in B. For example, the Frege proof 
system p-simulates the Resolution proof system, but the converse is widely con- 
jectured not to hold. 


2.2 First-Order Theories 


In this paper we study proof systems for first-order theories. For the sake of com- 
pleteness we recall some relevant definitions from first-order logic, but remark 
that this is essentially standard fare. 

Let £ be a first-order signature (a list of constant symbols, function symbols, 
and predicate symbols). Given a set of £-sentences A and an £-sentence B we 
write AF B if every model of A is also a model of B. A first order theory (or 
simply a theory) is a set of £-sentences that is consistent (that is, it has a model) 
and is closed under F. The decision problem for a theory T is the following: given 
a set S of literals over £, decide if there is a model M of T such that ME S. 
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The satisfiability problem for T, also denoted T-SAT, is the following: given a 
quantifier-free formula F in T in conjunctive normal form (CNF), decide if there 
is a model M of T such that MF F. 

A simple example of a theory is E, the conjunctive theory of equality. The 
signature of E contains a single predicate symbol = and an infinite list of con- 
stant symbols. It is axiomatized by the standard axioms of equality (reflexivity, 
symmetry, and transitivity), and a sample sentence in E would be the formula 
a#bVb#cVa=c, which encodes the transitivity of equality between the 
constant symbols a,b, and c. Following the SMT literature, we will call terms 
from the theory (such as a and b) theory variables, and the atoms derived from 
these terms (such as a Æ b or a = c) will be called theory literals or just literals. 
We note that the decision problem for E can be decided very efficiently [DST80]; 
in contrast, the satisfiability problem for E is easily seen to be NP-complete. 


3 Res(T): Resolution Modulo Theories 


We now define a generalization of resolution which captures the type of reasoning 
modulo a first-order theory that is common in SMT solvers. We give two variants: 
the first, denoted Res(T), allows the deduction of any clause C of theory literals 
such that T = C and for which every literal in C already occurs in the input 
formula. This is intended to model “standard” lazy SMT solvers [NOT06] which 
only reason about literals in the input formula. 

The second, more powerful variant is denoted Res* (T), and allows the deduc- 
tion of any clause of literals C such that T E C, even if the new clause contains 
literals which do not occur in the input formula. We introduce this to explore 
the power of lazy SMT solvers that are allowed to introduce new literals from 
the theory, and note that there are well-known examples in the SMT literature 
which show that introducing new literals can drastically decrease the length of 
refutations (e.g. the diamond equalities [BDdM08]). Indeed, in Sect. 5.2 we show 
that this power can drastically increase the proof theoretic strength of SMT 
solvers. 


Definition 1 (Res(T), Res*(T)). Let T be a theory and let F be an quantifier- 
free CNF formula over T. The lines of a Res(T) (Res*(T)) proof are quantifier- 
free clauses of theory literals deduced from F and T by the following derivation 
rules. 


Resolution. CV 0, DV £4 CV D. 
Weakening. CF C V £ for any theory literal £ occurring in the input formula. 


Theory Derivation (Res(T)). F C for any clause C satisfying T F C and for 
which every literal in C occurs in the input formula. 


Strong Theory Derivation (Res*(T)). | C for any clause C satisfying T E C. 
A refutation of F is a proof in which the final line is the empty clause. 
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It is easy to see that both Res(T) and Res*(T’) are sound since all rules are 
sound, and completeness follows from a straightforward modification of the usual 
proof of resolution completeness (see, e.g. Jukna [Juk12]). 

Technically speaking, Res(T) is not a (formal) propositional proof system as 
defined by Cook and Reckhow [CR79] since the proofs may not be efficiently 
verifiable if deductions from the theory T are computationally difficult to verify. 
However, all theories considered in this paper (cf. Sect.5) are very efficiently 
decidable, and thus the corresponding Res(T) proofs are efficiently verifiable. 

Note that the clauses introduced by the theory derivations are arbitrary the- 
orems of T; this means there is no direct information exchange between the 
resolution proof and the theory. It is enough to derive clauses in the theory 
derivation rules rather than arbitrary formulas since every axiom can be written 
in CNF form, and introduced as a sequence of clauses. The strong theory deriva- 
tion rule can introduce new theory literals which might not have been present in 
the initial formula—we emphasize that the new theory literals can even contain 
theory variables (i.e. first-order terms) that did not occur in the original formula. 
We will see that this ability to introduce new literals seems to give Res“ (T) extra 
power over general resolution. 


4 Lazy SMT Solvers and Res(T) 


In this section we show that lazy SMT solvers and resolution modulo theories 
are polynomially-equivalent as proof systems, provided that the SMT solvers are 
given a set of branching and restart decisions a priori. 

We model SMT solvers by the algorithm schema? DPLL(T), which is given 
in Algorithm 1. Using this schema we prove two results: first, if the theory 
solver in DPLL(T) can only reason about literals occurring in its input formula, 
then DPLL(T) is polynomially equivalent to Res(T’). Second, if the theory solver 
is strengthened so that it is allowed to introduce new literals then the result- 
ing solver can polynomially simulate Res*(T). The proofs of these results use 
techniques developed for comparing Boolean CDCL solvers and resolution by 
Pipatsrisawat and Darwiche [PD11]. 


? In the literature, SMT solvers are typically defined as abstract state-transition sys- 
tems (see, for instance, [GHN-+04,BM14]); we have chosen to define it instead as an 
algorithm schema (cf. Algorithm 1) inspired by the abstract definition of a CDCL 
solver by Pipatsrisawat and Darwiche [PD11]. 
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Algorithm 1. DPLL(T) 


Input: CNF formula F over T-literals; 
Output: SAT or UNSAT 
Let o = 0 be an initially empty partial assignment of T-literals; 
Let I be an initially empty collection of learned clauses; 
while true do 
if FAT Act; @ then 
if o =() then 
| return UNSAT; 
Apply the clause learning scheme to learn a conflict clause C, 
add it to I; 
Backjump o to the second highest decision level in C; 
else if o FT Ø then 
Apply the T-conflict scheme to learn a conflict clause C, add it 
to I; 
Backjump o to the second highest decision level in C; 
else 
if o satisfies F then 
return SAT; 
Apply the restart scheme to decide whether or not to restart; 
if restart then 
Set o = Q; 
Restart loop; 
Apply the T-propagate scheme; 
Unit propagate literals to completion and update o accordingly; 
Apply the branching scheme to choose a decision literal £, set 
a=oU {l}; 


If T is a theory and A, B are formulas over T then we write A FT B as a 
shorthand for T U {A} E B (i.e. every model of the theory T that satisfies A 
also satisfies B). We also define unit resolution, which describes the action of 
the unit propagator. 


Definition 2 (Unit Resolution). Let F be a collection of clauses over an 
arbitrary theory T. A clause C is derivable from F by unit resolution if there 
exists a resolution proof from F of C such that in each application of the reso- 
lution rule, one of the clauses is a unit clause. If C is derivable from F by unit 
resolution then we write F Hı C. If F ti Ú then we say F is unit refutable, 
otherwise it is unit consistent. 


A DPLL(T) algorithm is defined by specifying algorithms for each of the 
bolded “schemes” in Algorithm 1: 


Clause Learning Scheme. When a clause in the database is falsified by the 
current partial assignment, the Clause Learning Scheme is applied to learn 
a new clause C which is added to the database of stored clauses. 
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Restart Scheme. The solver applies the Restart Scheme to decide whether 
or not to restart its search, discarding the current partial assignment o and 
saving the list of learned clauses. 


Branching Scheme. The Branching Scheme is applied to choose an unas- 
signed variable from the formula F or from the learned clauses I" and assign the 
variable a Boolean value. 


T-Propagate Scheme. During search, the DPLL(T) solver can hand the theory 
solver the current partial assignment o and ask whether or not it should unit- 
propagate a literal; if a unit propagation is possible the theory solver will return 
a clause C from the theory witnessing this unit propagation. 


T-Conflict Scheme. When the theory solver detects that the current partial 
assignment o contradicts the theory, the T-Conflict Scheme is applied to learn 
a new clause of literals C, =C C ø, which is added to the clause database. 

We pay particular interest to the specification of the T-propagate scheme. 
The next definition describes two types of propagation schemes: a weak propa- 
gation scheme is only allowed to return clauses which propagate literals in the 
formula, while the more powerful strong propagation scheme returns a clause of 
literals from the theory that may contain new literals. 


Definition 3. A weak T-propagate scheme is an algorithm which takes as input 
a conjunction of theory literals o over T and returns (if possible) a clause C = 
a0 V £ where TF C and the literal £ occurs in the input formula of the DPLL(T) 
algorithm. 

A strong T-propagate scheme is an algorithm which takes as input a con- 
junction of literals o over T, and if possible returns a clause C of literals from T 
such that TE C and 70 C C. An algorithm equipped with a strong T-propagate 
scheme will be called a DPLL*(T) solver. 


A DPLL(T) algorithm equipped with a weak T-propagation scheme is equiv- 
alent to the basic theory propagation rules found in SMT solvers (see, for 
example, [BM14,NOTO06]). For technical convenience we assume that the weak 
T-propagate scheme adds a clause to the database “certifying” the unit prop- 
agation, while in actual implementations the clause would likely not be added 
and the literal would simply be propagated. Recent SMT solvers [Yic,Z3] have 
strengthened the interaction between the SAT solver and the theory solver, allow- 
ing the theory solver to return constraints over new variables; this is modelled 
very generally by strong T-propagate schemes. 


4.1 DPLL(T) and Res(T) 


We now prove the main result of this section, after introducing some preliminaries 
from [PD11] that are suitably modified for our setting. Fix a theory T. An 
assignment trail is a sequence of pairs o = { (tid) Ka where each literal 4; is 
a literal from the theory and each d; € {d, p}, indicating that the literal was 
set by a decision or a unit propagation. The decision level of a literal 4; in ø 
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is the number of decision literals occurring in ø up to and including ¢;. Given 
an assignment trail ø and a clause C we say that C is asserting if it contains 
exactly one literal occurring in ø at the highest decision level. A clause learning 
scheme is asserting if all conflict clauses produced by the scheme are asserting 
with respect to the assignment trail at the time of conflict. 

An extended branching sequence is an ordered sequence B = {61, 32,..., Bt} 
where each 3; is either (1) a literal from the theory, (2) a symbol x € {R, NR}, 
to denote a restart or no-restart, respectively, or (3) a clause C such that TF C. 
Intuitively, extended branching sequences are used to provide a DPLL(T) solver 
with a list of instructions for how to proceed in its execution. For instance, 
whenever the solver calls the Branching Scheme, we consume the next 3; from 
the sequence, and if it is a literal from the theory then the solver assigns that 
literal. Similarly, when the DPLL(T) solver calls the Restart Scheme it uses the 
branching sequence to dictate whether or not to restart, and when the solver 
calls the T-propagate scheme it uses the sequence to dictate which clause to 
learn. If the symbol does not correctly match the current scheme being called 
then the solver halts in error, and if the branching sequence is empty, then the 
algorithm proceeds using the heuristics defined by the algorithm. 

We now introduce absorbed clauses (and their duals, empowering clauses), 
which were originally defined by Pipatsrisawat and Darwiche [PD11] and inde- 
pendently by Atserias et al. [AF'T11]. One should think of the absorbed clauses 
as being learned “implicitly’—they may not necessarily appear in F, but, if we 
assign all but one of the literals in the clause to false then unit propagation in 
DPLL(T) will set the final literal to true. 


Definition 4 (Empowering Clauses). Let F be a collection of clauses over 
an arbitrary theory T and let A be a DPLL(T) solver. Let a be a conjunction 
of literals, and let C = (~a = £) be a clause. We say that C is empowering 
with respect to F at £ if the following holds: (1) FUT EC, (2) F Aa is unit 
consistent, and (8) any execution of A on F that satisfies a without setting £ 
does not unit-propagate L. The literal £ is said to be empowering. If item (1), (2) 
are satisfied but (3) is false then we say that the solver A and F absorbs C at 
L; if A and F absorbs C at at every literal then the clause is simply absorbed. 


For an example, consider the set of clauses (x V y V z), (~z V a), (~a V b). 
The clause (x V y V b) is absorbed by this set of clauses as, for instance, if we 
falsify x and y then the unit-propagator will force b to be set to true. Thus in 
the DPLL(T) algorithm the unit propagator will behave as though this clause is 
learned even though it is not (if we remove the final clause ~a Vb, then («Vy V b) 
is empowering but not absorbed). 

The next lemma shows that for any theory clause C, there is an extended 
branching sequence which can be applied to absorb that clause. 


Lemma 5. Let F be an unsatisfiable CNF over a theory T and let II be any 
Res(T) proof from F. Let Hr C H be the set of clauses in IT derived using 
the theory rule. For any DPLL(T) algorithm A there is an extended branching 
sequence B such that after applying B to the solver A every clause in Hr will 
be absorbed. 
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Proof. Order Ir arbitrarily as C),C2,...,C; and remove any clause that is 
absorbed or already in F, as these are clearly already absorbed. We construct 
B directly: add the negations of literals in C, to B until one literal remains, 
and then add the clause C1 to the extended branching sequence. By definition 
the weak T-propagator will be called and will return C1, adding it to the clause 
database. Restart and continue to the next theory clause in order. 


Our proof of mutual simulations between Res(T) and DPLL(T) crucially relies 
on the following technical lemma (which is a modified version of a lemma from 
[PD11]). 


Lemma 6. Let F be an unsatisfiable, unit-consistent CNF over literals from a 
theory T and let IT be any Res(T) proof from F. Let Ir be the set of clauses in 
IT derived using the theory rule. Then there exists a clause C in IT that is both 
empowering and unit-refutable with respect to F U Hr. 


Proof. Let IT denote a Res(T)-refutation of F and assume without loss of gen- 
erality (by applying Lemma 5) that the first derived clauses in M are in Ir. 
If every clause in H is unit-refutable from F, then the empty clause is unit- 
refutable and thus F is not unit-consistent, which is a contradiction. So, assume 
that there exists a clause C; which is the first clause in IT by this ordering such 
that it is not unit-refutable. Since JI is a Res(T)-proof, C; is one of three types: 
either it is a clause in £F, it is a clause derived from the theory rule, or C; was 
derived by applying the resolution rule to two clauses C}, Cp. If C; E€ F then 
it is clearly unit-refutable, which is a contradiction. If C; was derived from the 
theory rule then it is unit-refutable with respect to Hr, which is again a con- 
tradiction. Finally, suppose that C; was derived by applying the resolution rule 
to clauses Cj; and Cy, and write Cj = (a = £), Cy = (8 => £) where £ is the 
resolved literal and j,k < i in the ordering of clauses in I. Since Cj and Ck 
are both unit-refutable, assume by way of contradiction that neither C} nor Ck 
are empowering. It follows by definition that both clauses are absorbed at every 
literal. Thus, if we consider F Aa (3, it follows by the absorption property that 
FANaNBH, Ll, Fhan BF, ~l which implies that F \ aA B HT Ø. However, 
C; = aA 6, and thus we have concluded C; is unit-refutable, which is a contra- 
diction! Thus at least one of C} or Ck is both empowering and unit-refutable. 


The gist of the Lemma 6 is simple: if clauses CV and DV£ are both absorbed 
by a collection of clauses C, then asserting C A D in the DPLL solver will hit a 
conflict since it will unit-imply both @ and @. In the main theorem, proved next, 
we show that empowering and unit-refutable clauses will be absorbed by the 
solver after sufficiently many restarts. 


Theorem 7. The DPLL(T) system with an asserting clause learning scheme, 
non-deterministic branching and T-propagation polynomially simulates Res(T). 
Equivalently: for any unsatisfiable CNF F over a theory T, and any Res(T) refu- 
tation IT of F there exists an extended branching sequence B such that running 
a DPLL(T) algorithm on input F using B will refute F in time polynomial in 
the length of |II|. 
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Proof. Let F be an unsatisfiable CNF over the theory T, and let JI be a Res(T) 
refutation of F. Let Hr C H be the set of clauses in IT derived using the 
theory rule, and write IT = C1, C2,...,Cm. As a first step, apply Lemma 5 and 
construct an extended branching sequence B’ which leads to the absorbtion of 
all clauses in IT. We prove the following claim, from which the theorem directly 
follows. 


Claim. Let C be any unit-refutable and empowering clause with respect to F. 
Then there exists an extended branching sequence B of polynomial size such 
that after applying B the clause C will be absorbed. 

Let Z be any empowering literal of C, and write C = (a = £). Let B be 
any extended branching sequence in which all literals in œ are assigned. Since 
C is empowering, it follows that F A a is unit-consistent. Extending B with the 
decision literal ~£ will therefore cause a conflict since C is unit-refutable. Let 
C’ be the asserting clause obtained by applying the clause learning scheme to 
BU{x7¢}. If FAC’ absorbs C at £, then we are done and we continue to the next 
empowering literal. Otherwise, we resolve whatever conflicts the solver needs to 
resolve (possibly adding more learned clauses along the way) until the branching 
sequence is unit-consistent. 

Observe that after this process we must have that F A C’ +, Z where Z 
is some literal at the same decision level as £, since the clause learning scheme 
is asserting. Thus the number of literals at the maximum decision level has 
reduced by one. At this point, we restart and do exactly the same sequence of 
branchings—each time, as argued above, we reduce the number of literals at the 
maximum decision level by 1. Since @ is a literal at the maximum decision level, 
it implies that after at most O(n) restarts (and O(n?) learned clauses) we will 
have absorbed the clause C at £. Repeating this process at most n times for 
each empowering literal in C we can absorb C,, and it is clear that the number 
of learned clauses is polynomial from the analysis. 

We are now ready to finish the proof. Apply the claim repeatedly to the first 
empowering and unit-refutable clause in IT to absorb that clause—by Lemma 6, 
such a clause will exist as long as the CNF F is not unit-refutable; a DPLL(T) 
solver can obtain an arbitrary theory clause by setting relevant literals in the 
branching sequence and using theory propagation. Since the length of the proof 
IT is finite (length m), it follows that this process must terminate after at most m 
iterations. At this point, there can not be such an empowering and unit-refutable 
clause, and so by Lemma 6 it follows that F (with its learned clauses) is now 
unit-refutable, and so the DPLL(T) algorithm halts and outputs UNSAT. 


The reverse direction of the theorem is straightforward, and thus we have 
the following corollary: 


Corollary 8. The DPLL(T) system with an asserting clause learning scheme, 
non-deterministic branching and T-propagation is polynomially equivalent to 
Res(T). 

A key point of the above simulation is that it does not depend on whether 
or not the T-propagation scheme is weak or strong—since the clauses learned 
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by the scheme are specified in advance by the extended branching sequence the 
same proof will apply if we began with a Res*(T) proof instead. Of course, if we 
begin with a Res*(T) proof instead of a Res(T’) proof we may use the full power 
of the theory derivation rule, requiring that we use a DPLL* (T) algorithm with 
a strong T-propagation scheme instead. We record this observation as a second 
theorem. 


Theorem 9. The DPLL*(T) system with an asserting clause learning scheme, 
non-deterministic branching and T-propagation is polynomially equivalent to 
Res* (T). 


5 Case Studies: Resolution Modulo Common Theories 


In this section, we study the power of Res(T) over theories that are common in 
the SMT context—namely, we focus on the theory of equality E, the theory of 
uninterpreted function symbols EUF, and the theory of linear arithmetic LA. 


5.1 Resolution over E: A Theory of Equality 


We first consider E, the theory of equality. Bjørner et al. [BDdM08] introduced a 
proof-theoretic calculus called SP(E) for reasoning over the theory of equality— 
in a prototype of our main result, they showed that proofs in SP(E) exactly 
characterized proofs produced by a simple model SMT solver. In this section 
we show that the theory Res*(E) is polynomially-equivalent to SP(E), which is 
evidence that our general framework is the correct way of capturing the power 
of SMT solvers. 

Let us first reproduce the rules of SP(E) from [BDdM08]: Cut. CV, DV 
CV D, E-Dis. C Va # a F C, E-Eqs. CVa=bVa=ctCVa=bVbFe, 
Sup. C Va = b, Dla] | CV D[b]. Observe that the Sup rule allows replacing 
some occurrences of a term a in atoms of a clause D with b (not necessarily for 
all occurrences of a). Both the Sup rule and the E-Eqs rule can introduce literals 
that did not occur in the initial formula. 


Proposition 10. Res*(E) and SP(E) are polynomially equivalent. 


Proof (Sketch). Bjørner et al. show that SP(E) exactly characterizes the proofs 
produced by a simple theoretical model of an SMT solver, which we will denote 
by DPLL(e + A) [BDdM08, Theorem 4.1]. Examining the solver DPLL(e + A) 
from [BDdM08], it is not hard to see it is equivalent to the algorithm DPLL*(E) 
(that is, DPLL(Z’) with a strong T-propagation rule). The equivalence between 
Res*(E) and DPLL*(E) follows by the Corollary of Theorem 9. 


In the conclusion of [BDdM08] it is stated that there are no short SP(E) 
proofs of the following encoding of the pigeonhole principle (PHP): there are 
clauses of the form (d; = rı V...d; = rn), for i € [1,n + 1], enforcing that 
the ith pigeon must travel to some hole, and clauses of the form (d; 4 dj) for 
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i,j € [1,n +1] which, when combined with the first family of clauses and the 
transitivity axioms of E, imply that no two pigeons can travel to the same hole. 
Since their SP(E) system is equivalent to Res* (E) it follows that the lower bounds 
on SP(E) carry over: 


Corollary 11. If SP(E) does not have polynomial-size refutations of the pigeon- 
hole principle, then neither does Res* (E). 


5.2 Resolution over EUF: Equality with Uninterpreted Functions 


Next, we study the theory EUF, which is an extension of the theory of equality 
to contain uninterpreted function symbols. The signature of EUF consists of 
an unlimited set of uninterpreted function symbols and constant symbols; a 
term in the theory is thus inductively defined as either a constant symbol or an 
application of a function symbol to a sequence of terms: f(ti,...,t,). There is 
one relational symbol = interpreted as equality between terms, so theory literals 
of EUF are of the form t = t for terms t,t’. 

The axioms of EUF state that = is an equivalence relation, together with 
a family of congruence axioms for the function symbols stating, for any k- 
ary function symbol f and any sequences of terms ty, t2,...,tn, ti th,- -ths 
if ti = ti,- tk = tp, then f(ti,...,t.) = f(t,,...,t,). The decision problem 
for EUF can be decided in time O(nlogn) by the Downey-Sethi-Tarjan congru- 
ence closure algorithm [DST80]. 

Using EUF as a central example, Bjorner and de Moura [BM14] observed that 
DPLL(T) suffers some serious limitations in terms of access to the underlying 
theory. To resolve this, they modified DPLL(EUF) with a set of non-deterministic 
rules that allowed it to dynamically introduce clauses corresponding to the con- 
gruence and transitivity axioms. To characterize the strength of this new algo- 
rithm, they introduced a variant of resolution called E-Res, extending SP(E) 
from [BDdM08] to reasoning over uninterpreted functions. We show that the 
Res*(EUF) proof system can polynomially-simulate the E-Res system, which 
again suggests that we have the “correct” proof system for capturing SMT rea- 
soning. Due to space considerations, we leave the proof to the full version of the 


paper. 
Theorem 12. The system E-Res is polynomially simulated by Res* (EUF). 


However, unlike the case of SP(E) the converse direction is not so clear. The 
theory rule in Res*(EUF) is fundamentally semantic: it allows one to derive any 
clause which follows from the theory EUF semantically; this is in contrast to 
the E-Res system which is fundamentally syntactic. Thus, to show that E-Res 
polynomially simulates EUF, one would need to show that any use of the theory 
rule in a Res*(EUF) proof could be somehow replaced with a short proof in 
E-Res. We leave this as an open problem. 

Next, we show that Res*(EUF) and E-Res can efficiently simulate the Frege 
proof system, which is a very powerful propositional proof system studied in 
proof complexity. We note that the simulation crucially relies on the introduction 
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of new theory literals; this suggests that an SMT solver which can intelligently 
introduce new theory literals has the potential to be extremely powerful. 


Theorem 13. Res*(EUF) (and, in fact, E-Res) can efficiently simulate the 
Frege proof system. 


Proof Sketch. We show the stronger statement that E-Res simulates Frege. The 
idea of the proof is to introduce constants e9 Æ e corresponding to FALSE and 
TRUE; every positive literal x in the original formula is replaced by x = e1, and 
negative literal =x by x = eo. Then introduce uninterpreted function symbols 
N,O, A, together with constraints that make N,O,A behave as NOT, OR and 
AND, respectively (such as N (eo) = e1 A N(e1) = eo). So formulas in the Frege 
refutation are iteratively transformed into expressions of the form tr = eg or 
tr = e1, where tr is a term obtained by replacing Boolean connectives in a 
formula F by N,O,A. As the Frege proof ends with an empty sequent, the 
corresponding E-Res proof ends with an empty clause. See the full version for 
details. 


5.3 Resolution over LA: A Theory of Linear Arithmetic 


Finally, we study the theory of linear arithmetic LA. A formula in the theory LA 
over a domain D is a conjunction of expressions of the form 3_,a;x; o b, where 
o € {=,<,<,4,>,>}, and a;,x; € D — usually, D is integers or reals?. We 
show that Res(LA) polynomially simulates the proof system R(lin) introduced 
by Raz and Tzameret [RT08]. This is interesting, as R(lin) has polynomial-size 
proofs of several difficult tautologies considered in proof complexity, such as the 
pigeonhole principle, Tseitin tautologies and the clique-colouring principle. 

In the proof system R(lin) propositional variables are linear equations over 
integers. The input formula is a CNF over such equations, together with 
Ay_\(% = 0 V x; = 1) clauses ensuring 0/1 assignment. The rules of infer- 
ence consist of a modified resolution rule, together with two structural rules, 
weakening and simplification: 


R(lin)-cut. Let (A V Lı), (B V L2) be two clauses containing linear equalities 
Lı and L2, respectively. From these two clauses, derive a clause (A V B V 
(Li — L2)). 

Weakening. From a (possibly empty) clause A derive (AVL) for any equation L. 

Simplification. From (A V k = 0), where k 4 0 is a constant, derive A. 


Proposition 14. Res(LA) polynomially simulates Ri(lin). 


3 Some definitions of linear arithmetic do not include disequalities; however, as dise- 
qualities and strict inequalities occur naturally in SMT context, SMT-oriented linear 
arithmetic solvers do incorporate mechanisms for dealing with them. 
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Proof. We show how to simulate rules of R(lin) in Res(LA). We can assume, 
without loss of generality, that Res(LA) has a weakening rule which simulates 
weakening of R(lin) directly. For the simplification rule, note that LA E k 4 0 
for any k Æ 0; one application of the resolution rule on (k 4 0) and (A V k = 0) 
results in A. 

Finally, let Ly be X; jaiz; = b and Lo be X? cix; = d. From (A V Li), 
(B V L2) we want to derive (A V B V Lı — L2). First derive in LA a clause 
C = (X aixi A bV UP oa; A dV LE, (a; — cizi = b—d). Resolving (AV Li) 
with C, and then resolving the resulting clause with (B V L2) gives the desired 
(AV Bv (Lı — L2)). 


Note that we didn’t need to specify whether LA is over the integers, rationals 
or reals, and hence the proof works for any of them. Also, in order to establish 
our simulations it is sufficient to consider a fragment of LA with only equalities 
and inequalities, and produce only unit clauses and width-3 clauses of a fixed 
form. 


Corollary 15. Res(LA) has polynomial-size proofs of the pigeonhole principle, 
Tseitin tautologies and a clique-colouring principle for k = (n) size clique and 
k' = (log n)?/8log log n size colouring. 


6 Lazy vs. Eager Reductions and the Exponential Time 
Hypothesis 


Throughout this paper we have primarily discussed the Lazy approach to SMT. 
In this section, we consider the Eager approach, in which an input formula F 
over a theory T is reduced to an equisatisfiable propositional formula G, which 
is then solved using a suitable (Boolean) solver. 

The Eager approach is still used in several modern SMT solvers such as the 
STP solver for bit-vectors and arrays [GD07]. A common eager reduction used 
when solving equations over the theory of equality, E (or its generalization to 
uninterpreted function symbols EUF), is the Ackermann reduction. Let us first 
describe a simple version of the Ackermann reduction over the theory E. 

Let F denote a CNF over literals from the theory E—so, each literal is of 
the form a = b for constant terms a, b—which we will ultimately transform into 
a Boolean SAT instance. Let n denote the number of constant terms occurring 
in F, let m denote the number of distinct literals occurring in F, and consider 
the literal a = b and the literal b = a to be the same. For each literal a = b 
introduce a Boolean variable x,—», and for each clause of literals V: a; = b; create 
a clause V; Za;=b;. To encode the transitivity of equality, for each triple of terms 
(a,b,c) occurring in the initial CNF F introduce a clause of the form 72,=» V 
AXp=c V La=c- Note that the final formula will have O(n?) Boolean variables 
corresponding to each possible term a = b—a potential quadratic blow-up— 
which is unavoidable using this encoding due to the transitivity axioms. Observe 
that this blow-up only occurs in the eager approach—in the lazy approach to 
solving we only need to consider the literals a = b which occur in the original 


The Proof Complexity of SMT Solvers 291 


formula F. It is therefore natural to wonder if this blow-up in the number of 
input variables can somehow be avoided. 

In fact, one can construct a more clever Eager reduction from E-SAT to SAT 
which only introduces O(nlogn) boolean variables; however, this more clever 
encoding does not represent the literals a = b as Boolean variables £a= and 
instead uses a more complicated pointer construction. This improved reduction 
turns out to be the best possible under the well-known (and widely believed) 
Exponential Time Hypothesis, which is a strengthening of P Æ NP. 


Exponential Time Hypothesis (ETH). There is no deterministic or ran- 
domized algorithm for SAT running in time 2°, where n is the number of 
input variables. 


Theorem 16. Let F be an instance of E-SAT with n distinct terms. For any 
polynomial-time reduction R from E-SAT to SAT, the boolean formula R(F) 
must have 2(nlogn) variables unless ETH fails. 


Proof. By way of contradiction, suppose that ETH holds and let R be a reduction 
from E-SAT to SAT which introduces o(nlog n) variables. Let 2-CSP denote a 
constraint satisfaction problem with two variables per constraint. The theorem 
follows almost immediately from the following result of Traxler [Tra08]. 


Theorem 17 (Theorem 1 in [Tra08], Rephrased). Consider any 2-CSP C1^ 
C2 N-A Cm over an alphabet X of size d, where each constraint is of the form 
x #aVy#b for variables x,y and constants a,b € X. Unless ETH fails, every 
algorithm for this problem requires time d°” for some universal constant c > 0. 


There is a simple reduction from the restriction of 2-CSP described in the 
above theorem to E-SAT. Introduce terms e1, €2,...,€q, each intended to rep- 
resent a symbol from the universe X, and also terms £1, £2,...,£n for each 
variable x occurring in the original CSP instance. Now, for each i Æ j introduce 
unit clauses e; # ej, and similarly for each i € [n] add a clause of the form 
zi = €1 V £i = e2 V +- V £i = eg. Finally, for each constraint in the 2-CSP of 
the form z; # a V xj ¥ b introduce a clause £; Æ ea V £j A eb, where ea, €b 
are the terms corresponding to the symbols a,b. Let F’ denote the final E-SAT 
instance, and it is clear that F’ is satisfiable if and only if the original 2-CSP is 
satisfiable, and also that F’ has n +d constant terms. 

Now, apply the Ackermann reduction R to F', obtaining a SAT instance 
R(F'). By assumption the final SAT instance has o((n + d) log(n + d)) variables; 
running the standard brute-force algorithm for SAT gives an algorithm running 
in 20((r+4) log(n+d)) time for the 2-CSP variant described above. However, by the 
above theorem, every algorithm for this 2-CSP variant requires time at least 
de” = 2 les which violates ETH if d= n. 


7 Conclusion 


In this paper, we studied SMT solvers through the lens of proof complexity, 
introducing a generalization of the resolution proof system and arguing that it 
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correctly models the “lazy” SMT framework DPLL(T) [NOT06]. We further pre- 
sented and analyzed a stronger version Res*(T’) that allows for the introduction 
of new literals, and showed that it models DPLL*(T), which is a modification 
of an SMT solver that can introduce new theory literals; this captures the new 
literal introduction in solvers such as Yices and Z3 [Z3, Yic]. 

There are many natural directions to pursue. First, although we have not con- 
sidered it here, it is natural to introduce an intermediate proof system between 
Res(T) and Res*(T) which is allowed to introduce new theory literals but not 
new theory variables. For instance, if we have the formula a = f(b) \a = c 
in EUF, then this intermediate proof system could introduce the theory literal 
c = f(b) but not the theory literal f(c) = f(a), whereas both are allowed to 
be introduced by Res*(T). It is not clear to us if this intermediate system can 
simulate Frege, and we suggest studying it in its own right. 

A second direction that we believe is quite interesting is extending our results 
on EUF to capture the extended Frege system, which is the most powerful proof 
system typically studied in proposition proof complexity. Intuitively, it seems 
that EUF by itself is not strong enough to capture extended Frege; we consider 
finding a new theory T which can capture it an interesting open problem. 
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Abstract. We focus in this paper on generating models of quantified 
first-order formulas over built-in theories, which is paramount in software 
verification and bug finding. While standard methods are either geared 
toward proving the absence of a solution or targeted to specific theories, 
we propose a generic and radically new approach based on a reduction to 
the quantifier-free case. Our technique thus allows to reuse all the efficient 
machinery developed for that context. Experiments show a substantial 
improvement over state-of-the-art methods. 


1 Introduction 


Context. Software verification methods have come to rely increasingly on rea- 
soning over logical formulas modulo theory. In particular, the ability to generate 
models (i.e., find solutions) of a formula is of utmost importance, typically in the 
context of bug finding or intensive testing—symbolic execution [21] or bounded 
model checking [7]. Since quantifier-free first-order formulas on well-suited theo- 
ries are sufficient to represent many reachability properties of interest, the Satis- 
fiability Modulo Theory (SMT) [6,25] community has primarily dedicated itself 
to designing solvers able to efficiently handle such problems. 

Yet, universal quantifiers are sometimes needed, typically when consider- 
ing preconditions or code abstraction. Unfortunately, most theories handled by 
SMT-solvers are undecidable in the presence of universal quantifiers. There exist 
dedicated methods for a few decidable quantified theories, such as Presburger 
arithmetic [9] or the array property fragment [8], but there is no general and 
effective enough approach for the model generation problem over universally 
quantified formulas. Indeed, generic solutions for quantified formulas involving 
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heuristic instantiation and refutation are best geared to proving the unsatis- 
fiability of a formula (i.e., absence of solution) [13,20], while recent proposals 
such as local theory extensions [2], finite instantiation [31,32] or model-based 
instantiation [20,29] either are too narrow in scope, or handle quantifiers on free 
sorts only, or restrict themselves to finite models, or may get stuck in infinite 
refinement loops. 


Goal and Challenge. Our goal is to propose a generic and efficient approach to 
the model generation problem over arbitrary quantified formulas with support 
for theories commonly found in software verification. Due to the huge effort 
made by the community to produce state-of-the-art solvers for quantifier-free 
theories (QF-solvers), it is highly desirable for this solution to be compatible 
with current leading decision procedures, namely SMT approaches. 


Proposal. Our approach turns a quantified formula into a quantifier-free for- 
mula with the guarantee that any model of the latter contains a model of the 
former. The benefits are threefold: the transformed formula is easier to solve, 
it can be sent to standard QF-solvers, and a model for the initial formula is 
deducible from a model of the transformed one. The idea is to ignore quanti- 
fiers but strengthen the quantifier-free part of the formula with an independence 
condition constraining models to be independent from the (initially) quantified 
variables. 


Contributions. This paper makes the following contributions: 


We propose a novel and generic framework for model generation of quantified 
formula (Sect.5, Algorithm 1) relying on the inference of sufficient indepen- 
dence condition (Sect. 4). We prove its correctness (Theorem 1, mechanized in 
Coq) and its efficiency under reasonable assumptions (Propositions 4 and 5). 
Especially our approach implies only a linear overhead in the formula size. 
We also briefly study its completeness, related to the notion of weakest inde- 
pendence condition. 

We define a taint-based procedure for the inference of independence conditions 
(Sect. 5.2), composed of a theory-independent core (Algorithm 2) together 
with theory-dependent refinements. We propose such refinements for a large 
class of operators (Sect. 6.2), encompassing notably arrays and bitvectors. 

Finally, we present a concrete implementation of our method specialized on 
arrays and bitvectors (Sect. 7). Experiments on SMT-LIB benchmarks and 
software verification problems notably demonstrate that we are able not only 
to very effectively lift quantifier-free decision procedures to the quantified 
case, but also to supplement recent advances, such as finite or model-based 
quantifier instantiation [20,29,31,32]. Indeed, we concretely supply SMT 
solvers with the ability to efficiently address an extended set of software ver- 
ification questions. 


Discussions. Our approach supplements state-of-the-art model generation on 
quantified formulas by providing a more generic handling of satisfiable problems. 
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We can deal with quantifiers on any sort and we are not restricted to finite mod- 
els. Moreover, this is a lightweight preprocessing approach requiring a single call 
to the underlying quantifier-free solver. The method also extends to partial elim- 
ination of universal quantifiers, or reduction to quantified-but-decidable formulas 
(Sect. 5.4). 

While techniques a la E-matching allow to lift quantifier-free solvers to the 
unsatisfiability checking of quantified formulas, this works provides a mechanism 
to lift them to the satisfiability checking and model generation of quantified 
formulas, yielding a more symmetric handling of quantified formulas in SMT. 
This new approach paves the way to future developments such as the definition of 
more precise inference mechanisms of independence conditions, the identification 
of interesting subclasses for which inferring weakest independence conditions is 
feasible, and the combination with other quantifier instantiation techniques. 


2 Motivation 


Let us take the code sample in Fig. 1 and suppose we want to reach function 
analyze_me. For this purpose, we need a model (a.k.a., solution) of the reachabil- 
ity condition ¢ = ax +b > 0, where a, b and x are symbolic variables associated 
to the program variables a, b and x. However, while the values of a and b are 
user-controlled, the value of x is not. Therefore if we want to reach analyze_me in 
a reproducible manner, we actually need a model of ġy £ Vx.ax +b > 0, which 
involves universal quantification. While this specific formula is simple, model 
generation for quantified formulas is notoriously difficult: PSPACE-complete for 
booleans, undecidable for uninterpreted functions or arrays. 


int main () { 
int a = input (); 
int b = input (); 


Quantified reachability condition 
(1) Vz.az +b >0 


Taint variable constraint 


int x = rand (); (2) a® Ab? A7(a*) (a°,b?, x° : fresh boolean) 
if (a * x + b> Oo) { Independence condition 
analyze_me(); (3) ((a® Az?) V (a°? Aa =0) V (2 Az =0)) Ab? 
a : (4) (TAL) V(TAa=0)V(LAr=0))AT 
else 
na (5)a=0 
t Quantifier-free approximation of (1) 


(6) (ax +b > 0) A (a = 0) 


Fig. 1. Motivating example 


Reduction to the Quantifier-Free Case Through Independence. We pro- 
pose to ignore the universal quantification over x, but restrict models to those 
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which do not depend on x. For example, model {a = 1,x = 1,b = 0} does depend 
on x, as taking x = 0 invalidates the formula, while model {a = 0,x = 1,b = 1} 
is independent of x. We call constraint Y £ (a = 0) an independence condition: 
any interpretation of ġ satisfying ~ will be independent of x, and therefore a 
model of ¢ A w will give us a model of ¢y. 


Inference of Independence Conditions Through Tainting. Figure 1 
details in its right part a way to infer such independence conditions. Given a 
quantified reachability condition (1), we first associate to every variable v a 
(boolean) taint variable v° indicating whether the solution may depend on v 
(value T) or not (value L). Here, x° is set to L, a° and b° are set to T (2). 
An independence condition (3)—a formula modulo theory—is then constructed 
using both initial and taint variables. We extend taint constraints to terms, t° 
indicating here whether t may depend on x or not, and we require the top-level 
term (i.e., the formula) to be tainted to T (i.e., to be indep. from x). Condition 
(3) reads as follows: in order to enforce that (ax +b > 0)° holds, we enforce 
that (ax)? and b° hold, and for (az)? we require that either a° and x° hold, or 
a° holds and a = 0 (absorbing the value of x), or the symmetric case. We see 
that -° is defined recursively and combines a systematic part (if t° holds then 
f(t)® holds, for any f) with a theory-dependent part (here, based on x). After 
simplifications (4), we obtain a = 0 as an independence condition (5) which is 
adjoined to the reachability condition freed of its universal quantification (6). 
A QF-solver provides a model of (6) (e.g., {a = 0,b = 1,x = 5}), lifted into a 
model of (1) by discarding the valuation of x (e.g., {a = 0,b = 1}). 

In this specific example the inferred independence condition (5) is the most 
generic one and (1) and (6) are equisatisfiable. Yet, in general it may be an 
under-approximation, constraining the variables more than needed and yielding 
a correct but incomplete decision method: a model of (6) can still be turned into 
a model of (1), but (6) might not have a model while (1) has. 


3 Notations 


We consider the framework of many-sorted first-order logic with equality, and 
we assume standard definitions of sorts, signatures and terms. Given a tuple of 
variables x = (a1,...,@n) and a quantifier Q (Y or 3), we shorten Ox, ... Orn. 
as Ox.®. A formula is in prenex normal form if it is written as Q1£1 ... Qn£n.® 
with ® a quantifier-free formula. A formula is in Skolem normal form if it is in 
prenex normal form with only universal quantifiers. We write (x) to denote 
that the free variables of @ are in æ. Let t £ (t1,..., tn) be a term tuple, we 
write ® (t) for the formula obtained from ® by replacing each occurrence of z; 
in & by t;. An interpretation T associates a domain to each sort of a signature 
and a value to each symbol of a formula, and [A]z denotes the evaluation of 
term A over Z. A satisfiability relation = between interpretations and formulas 
is defined inductively as usual. A model of ® is an interpretation Z satisfying 
T = @®. We sometimes refer to models as “solutions”. Formula W entails formula 
P, written Y H @, if every interpretation satisfying W satisfies ® as well. Two 
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formulas are equivalent, denoted W = 9, if they have the same models. A theory 
T = (X,T) restricts symbols in X to be interpreted in Z. The quantifier-free 
fragment of T is denoted QF-T. 


Convention. Letters a,b,c... denote uninterpreted symbols and variables. Let- 
ters x,y,z... denote quantified variables. a,b,c denote sets of uninterpreted 
symbols. x,y,z... denote sets of quantified variables. Finally, a,b,c... denote 
valuations of associated (sets of) symbols. 

In the rest of this paper, we assume w.l.o.g. that all formulas are in Skolem 
normal form. Recall that any formula in classical logic can be normalized into 
a formula w in Skolem normal form such that any model of ọ can be lifted 
into a model of Y, and vice versa. This strong relation, much closer to formula 
equivalence than to formula equisatisfiability, ensures that our correctness and 
completeness results all along the paper hold for arbitrarily quantified formula. 


Companion Technical Report. Additional technical details (proofs, experi- 
ments, etc.) are available online at http://benjamin.farinier.org/cav2018/. 


4 Musing with Independence 


4.1 Independent Interpretations, Terms and Formulas 


A solution (x,a) of ® does not depend on a if &(a,a) is always true or always 
false, for all possible valuations of a as long as a is set to a. More formally, we 
define the independence of an interpretation of @ w.r.t. x as follows: 


Definition 1 (Independent interpretation) 


- Let ®(x,a) a formula with free variables x and a. Then an interpretation T 
of ®(x,a) is independent of x if for all interpretations J equal to T except 
on x, IE 8 if and only if J =| &. 

- Let A (x,a) a term with free variables x and a. Then an interpretation T of 
A(a,a) is independent of x if for all interpretations J equal to T except on 
z, [A (æ,a)]z = [A(2,a)]z. 


Regarding formula ax + b > 0 from Fig. 1, {a = 0,b = 1,x = 1} is indepen- 
dent of x while {a = 1,b = 0, x = 1} is not. Considering term (t [a — b]) [c], with 
t an array written at index a then read at index c, {a = 0,b = 42,c=0,t = [|...]} 
is independent of t (evaluates to 42) while {a = 0,b = 1,c =2,t = |...]} is not 
(evaluates to t[2]). We now define independence for formulas and terms. 


Definition 2 (Independent formula and term) 


- Let (x,a) a formula with free variables x and a. Then P (x,a) is indepen- 
dent of x if Va.Vy.(®(x,a) = ®(y,a)) is true for any value of a. 

- Let A (x,a) a term with free variables x anda. Then A (x,a) is independent 
of x if VaVy.(A(a,a) = A(y,a)) is true for any value of a. 
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Definition 2 of formula and term independence is far stronger than Defini- 
tion 1 of interpretation independence. Indeed, it can easily be checked that if a 
formula @ (resp. a term A) is independent of x, then any interpretation of ® 
(resp. A) is independent of x. However, the converse is false as formula ax+b > 0 
is not independent of x, but has an interpretation {a = 0,b = 1,2 = 1} which is. 


4.2 Independence Conditions 


Since it is rarely the case that a formula (resp. term) is independent from a set 
of variables x, we are interested in Sufficient Independence Conditions. These 
conditions are additional constraints that can be added to a formula (resp. term) 
in such a way that they make the formula (resp. term) independent of x. 


Definition 3 (Sufficient Independence Condition (SIC)) 


- A Sufficient Independence Condition for a formula (x,a) with regard to x 
is a formula ¥ (a) such that Y (a) =| (Va.Vy.2 (x,a) = &(y,a)). 

- A Sufficient Independence Condition for a term A (x,a) with regard to æ, is 
a formula ¥ (a) such that ¥ (a) =| (Va.Vy.A(a,a) = A(y,a)). 


We denote by SIC,» (resp. SICA.) a Sufficient Independence Condition for 
a formula (x,a) (resp. for a term A(a#,a)) with regard to x. For example, 
a = 0 is a siCg,. for formula 6 = ar +b > 0, and a = c is a SICA, for term 
A = (t[a — b]) [c]. Note that L is always a sic, and that sic are closed under A 
and V. Proposition 1 clarifies the interest of sic for model generation. 


Proposition 1 (Model generalization). Let (x,a) a formula and Y a 
SICg,2. If there exists an interpretation {x,a} such that {x,a} - ¥ (a) ^8 (x,a), 
then {a} = Va. (a, a). 


Proof (sketch of). Appendix C.1 of the companion technical report. 


For the sake of completeness, we introduce now the notion of Weakest 
Independence Condition for a formula ® (x,a) with regard to æ (resp. a term 
A(a,a)). We will denote such conditions WICg,z (resp. WICA,z). 


Definition 4 (Weakest Independence Condition (WIC)) 


- A Weakest Independence Condition for a formula © (x,a) with regard to x is 
a SIC œ I such that, for any other SIC xœ Y, Y = I. 

- A Weakest Independence Condition for a term A (x,a) with regard to x is a 
SICA. II such that, for any other SICA = Y, Y = II. 


Note that Q = Va.Vy. (@(x,a) = &(y,a)) is always a WICs z, and any for- 
mula JT is a WIC œ if and only if IH = Q. Therefore all syntactically different 
WIC have the same semantics. As an example, both SIC a = 0 and a = c pre- 
sented earlier are WIC. Proposition 2 emphasizes the interest of wic for model 
generation. 
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Proposition 2 (Model specialization). Let ®(x,a) a formula and IT(a) a 
WICé,2. If there exists an interpretation {a} such that {a} = YVæx.® (x,a), then 
{x,a} E H (a) A8 (x,a) for any valuation x of x. 


Proof (sketch of). Appendix C.2 of the companion technical report. 


From now on, our goal is to infer from a formula Vax. (x,a) a SICS s Y (a), 
find a model for ¥ (a) A ®(a,a) and generalize it. This SIC should be as 
weak—in the sense “less coercive”—as possible, as otherwise | could always be 
used, which would not be very interesting for our overall purpose. 

For the sake of simplicity, previous definitions omit to mention the theory 
to which the sic belongs. If the theory T of the quantified formula is decidable 
we can always choose Vx.Vy. (@(x,a) = &(y,a)) as a SIC, but it is simpler to 
directly use a T-solver. The challenge is, for formulas in an undecidable theory 
T, to find a non-trivial SIC in its quantifier-free fragment QF-T. 

Under this constraint, we cannot expect a systematic construction of WIC, 
as it would allow to decide the satisfiability of any quantified theory with a 
decidable quantifier-free fragment. Yet informally, the closer a SIC is to be a 
WIC, the closer our approach is to completeness. Therefore this notion might be 
seen as a fair gauge of the quality of a sic. Having said that, we leave a deeper 
study on the inference of WIC as future work. 


5 Generic Framework for SIC-Based Model Generation 


We describe now our overall approach. Algorithm1 presents our sIc-based 
generic framework for model generation (Sect. 5.1). Then, Algorithm 2 proposes a 
taint-based approach for sic inference (Sect. 5.2). Finally, we discuss complexity 
and efficiency issues (Sect. 5.3) and detail extensions (Sect. 5.4), such as partial 
elimination. 

From now on, we do not distinguish anymore between terms and formulas, 
their treatment being symmetric, and we call targeted variables the variables we 
want to be independent of. 


5.1 SIC-Based Model Generation 


Our model generation technique is described in Algorithm 1. Function solveQ 
takes as input a formula Vx.® (x, a) over a theory T. It first calculates a SICg x 
W (a) in QF-T. Then it solves ® (x,a) ^ Y (a). Finally, depending on the result 
and whether Y (a) is a WIC, or not, it answers SAT, UNSAT or UNKNOWN. 
solveQ is parametrized by two functions solveQF and inferSIC: 


solveQF is a decision procedure (typically a SMT solver) for QF-T. solveQF is 
said to be correct if each time it answers SAT (resp. UNSAT) the formula is 
satisfiable (resp. unsatisfiable); it is said to be complete if it always answers 
SAT or UNSAT, never UNKNOWN. 
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Algorithm 1. SIC-based model generation for quantified formulas 


Parameter: solveQF 
Input: (v) a formula in QF-T 
| Output: sat (v) with v = 6, UNSAT or UNKNOWN 


Parameter: inferSIC 
Input: @ a formula in QF-7, and gz a set of targeted variables 
| Output: Y a SICS, in QF-T 


Function solveQ: 
Input: Yz. (x, a) a universally quantified formula over theory T 
Output: SAT (a) with a = Vx.® (x,a), UNSAT or UNKNOWN 
Let Y (a) £ inferSICc (8 (x,a), æ) 
match solveQF (® (x,a) ^Y (a)) 

with SAT (x,a) return SAT (a) 

with UNSAT 

| if Y is a WICe,, then return UNSAT 


else return UNKNOWN 


with UNKNOWN return UNKNOWN 


inferSIC takes as input a formula & in QF-T and a set of targeted variables æ, 
and produces a SICg,, in QF-T. It is said to be correct if it always returns a 
sic, and complete if all the sic it returns are wic. A possible implementation 
of inferSIC is described in Algorithm 2 (Sect. 5.2). 


Function solveQ enjoys the two following properties, where correctness and com- 
pleteness are defined as for solveQF. 


Theorem 1 (Correctness and completeness) 


— If solveQF and inferSIC are correct, then solue is correct. 
— If solveQF and inferSIC are complete, then solveQ is complete. 


Proof (sketch of). Follow directly from Propositions 1 and 2 (Sect. 4.2). 


5.2 Taint-Based SIC Inference 


Algorithm 2 presents a taint-based implementation of function inferSIC. It con- 
sists of a (syntactic) core calculus described here, refined by a (semantic) theory- 
dependent calculus theorySIC described in Sect. 6. From formula ® (x,a) and 
targeted variables æ, inferSIC is defined recursively as follow. 

If @ is a constant it returns T as constants are independent of any variable. If 
@ is a variable v, it returns T if we may depend on v (i.e., v ¢ x), L otherwise. If 
@ is a function f (¢1,...,n), it first recursively computes for every sub-term ¢; 
a SIC¢,,2 Yi. Then these results are sent with to theorySIC which computes a 
SICg,2 W. The procedure returns the disjunction between W and the conjunction 
of the Y;’s. Note that theorySIC default value L is absorbed by the disjunction. 
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Algorithm 2. Taint-based sic inference 


Parameter: theorySIC 
Input: f a function symbol, its parameters ¢;, x a set of targeted variables 
and ~w,; their associated SIC¢g, x 
Output: V a SICr(g,) « 
L Default: Return L 
Function inferSIC(@,a): 
Input: & a formula and z a set of targeted variables 
Output: W a SICg œ 
either ® is a constant return T 
either @ is a variable v return v ¢ x 
either ® is a function f (¢1,.,¢n) 
Let Y; = inferSIC (¢;, x) for all i € {1,.,n} 
Let YW = theorySIC (f, (¢1,.,¢n), (W1;-, Un), £) 
return VV A, Wi 


I> [I> 


The intuition is that if the ¢;’s are independent of x, then f (¢1,...,@n) is. 
Therefore Algorithm 2 is said to be taint-based as, when theorySIC is left to its 
default value, it acts as a form of taint tracking [15,27] inside the formula. 


Proposition 3 (Correctness). Given a formula (x,a) and assuming that 
theorySIC is correct, then inferSIC(®,x) indeed computes a SICs x. 


Proof (sketch of). This proof has been mechanized in Coq'. 


Note that on the other hand, completeness does not hold: in general inferSIC 
does not compute a WIC, cf. discussion in Sect. 5.4. 


5.3 Complexity and Efficiency 


We now evaluate the overhead induced by Algorithm 1 in terms of formula size 
and complexity of the resolution—the running time of Algorithm 1 itself being 
expected to be negligible (preprocessing). 


Definition 5. The size of a term is inductively defined as size(x) & 1 for x 
a variable, and size(f (t1,..-.,tn)) = 1+ X; size (t;) otherwise. We say that 
theorySIC is bounded in size if there exists K such that, for all terms A, 


size (theorySIC(A,-)) < K. 


Proposition 4 (Size bound). Let N be the mazimal arity of symbols defined 
by theory T. If theorySIC is bounded in size by K, then for all formula © in T, 
size (inferSIC(®,-)) < (K + N) - size(®). 


1 http://benjamin.farinier.org/cav2018/. 
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Proposition 5 (Complexity bound). Let us suppose theorySIC bounded in 
size, and let ® be a formula belonging to a theory T with polynomial-time check- 
able solutions. If W is a SICg,. produced by inferSIC, then a solution for BAW 
is checkable in time polynomial in size of B. 


Proof (sketch of). Appendices C.3 and C.4 of the companion technical report. 


These propositions demonstrate that, for formula landing in complex enough 
theories, our method lifts QF-solvers to the quantified case (in an approximated 
way) without any significant overhead, as long as theorySIC is bounded in size. 
This latter constraint can be achieved by systematically binding sub-terms to 
(constant-size) fresh names and having theorySIC manipulates these binders. 


5.4 Discussions 


Extension. Let us remark that our framework encompasses partial quantifier 
elimination as long as the remaining quantifiers are handled by solveQF. For 
example, we may want to remove quantifications over arrays but keep those on 
bitvectors. In this setting, inferSIC can also allow some level of quantification, 
providing that solveQF handles them. 


About WIC. As already stated, inferSIC does not propagate WIC in general. 
For example, considering formulas tı = (x < 0) and t2 = (x > 0), then WIC z = 
L and WIC;,,., = L. Hence inferSIC returns L as SIC for tı V t2, while actually 
WIC, Vto,a = Ts 

Nevertheless, we can already highlight a few cases where WIC can be com- 
puted. (1) inferSIC does propagate WIC on one-to-one uninterpreted functions. 
(2) If no variable of x appears in any sub-term of f(t,t’), then the associated 
wic is T. While a priori naive, this case becomes interesting when combined 
with simplifications (Sect.7.1) that may eliminate æ. (3) If a sub-term falls in 
a sub-theory admitting quantifier elimination, then the associated WIC is com- 
puted by eliminating quantifiers in (V.a-y.6(a,a) = (y, a)). (4) We may also 
think of dedicated patterns: regarding bitvectors, the WIC for x < a >x < xr+k 
is a < Max — k. Identifying under which condition WIC propagation holds is a 
strong direction for future work. 


6 Theory-Dependent SIC Refinements 


We now present theory-dependent SIC refinements for theories relevant to pro- 
gram analysis: booleans, fixed-size bitvectors and arrays—recall that uninter- 
preted functions are already handled by Algorithm 2. We then propose a gener- 
alization of these refinements together with a correctness proof for a larger class 
of operators. 
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6.1 Refinement on Theories 


We recall theorySIC takes four parameters: a function symbol f, its arguments 
(t1,...,tn), their associated sic (tÌ,..., t? ), and targeted variables x. theorySIC 
pattern-matches the function symbol and returns the associated SIC according to 
rules in Fig. 2. If a function symbol is not supported, we return the default value 
L. Constants and variables are handled by inferSIC. For the sake of simplicity, 
rules in Fig. 2 are defined recursively, but can easily fit the interface required for 
theorySIC in Algorithm 2 by turning recursive calls into parameters. 


Booleans and Ite. Rules for the boolean theory (Fig. 2a) handles >, A, V 
and ite (if-then-else). For binary operators, the sic is the conjunction of the 
SIC associated to one of the two sub-terms and a constraint on this sub-term 
that forces the result of the operator to be constant—e.g., to be equal to L 
(resp. T) for the antecedent (resp. consequent) of an implication. These equality 
constraints are based on absorbing elements of operators. 

Inference for the ite operator is more subtle. Intuitively, if its condition is 
independent of some æ, we use it to select the SIC, of the sub-term that will be 
selected by the ite operator. If the condition is dependent of a, then we cannot 
use it anymore to select a SIC,. In this case, we return the conjunction of the 
SIC, of both sub-terms and the constraint that the two sub-terms are equal. 


(a> b)? £ (a®Aa=1)V(bO°Ab=T) (an Abn)? £ (af, Aan = On) V (b$, A bn = On) 
(a nb)? £ (a° Aa=1)v (b Ab=1) (an V bn)® © (ag Aan = 1n) V (OA bn = In) 
(avb)? £ (a®Aa=T)V(b9Ab=T) (an X bn)? £ (ap, Aan = On) V (b), A bn = On) 
(itecab)? = (c* A iteca® b?) V (a°? Ab? Aa =b) (an K bn)? £ (bf Abn > n) 
(a) Booleans and ite (b) Fixed-size bitvectors 


£ (ite (i = j) e (select aj))* 
£ ((i = 7)* A (ite (i = j) e° (select aj)*)) V (e° A (select a j)? A (e = select a j) ) 
4 (i? A j° A (ite (i = j) e° (select aj)*)) V (e° A (select a j)? A (e = select a j)) 


(select (storeaie) j) 


(c) Arrays 
Fig. 2. Examples of refinements for theorySIC 


Bitvectors and Arrays. Rules for bitvectors (Fig. 2b) follow similar ideas, with 
constant T (resp. L) substituted by 1, (resp. 0»), the bitvector of size n full of 
ones (resp. zeros). Rules for arrays (Fig. 2c) are derived from the theory axioms. 
The definition is recursive: rules need be applied until reaching either a store at 
the position where the select occurs, or the initial array variable. 

As a rule of thumb, good SIc can be derived from function axioms in the form 
of rewriting rules, as done for arrays. Similar constructions can be obtained for 
example for stacks or queues. 


6.2 R-Absorbing Functions 


We propose a generalization of the previous theory-dependent SIC refinements 
to a larger class of functions, and prove its correctness. 
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Intuitively, if a function has an absorbing element, constraining one of its 
operands to be equal to this element will ensure that the result of the function 
is independent of the other operands. However, it is not enough when a relation 
between some elements is needed, such as with (t[a — b]) [c] where constraint 
a = c ensures the independence with regards to t. We thus generalize the notion 
of absorption to R-absorption, where 7 is a relation between function arguments. 


Definition 6. Let f : 7, X- X Tn —> T a function. f is R-absorbing if there 
exists Tr C {1,--- ,n} and R a relation between a; : Ti, i € Tr such that, for all 
b= (bi,...,5n) and c= (c1,...,Cn) E TLX- X Tn, if R(Olz,) and ble, = ez, 
where -|z is the projection on Tr, then f(b) = f(c). 

Tr is called the support of the relation of absorption R. 


For example, (a,b) +> a V b has two pairs (R, Zr) coinciding with the usual 
notion of absorption, (a=T, {1,}) and (b=T, {2,}). Function (a, y, z) => zy +z 
has among others the pair (x=0, {1.,3.}), while (a,b,c,t) > (ta — b]) [c] has 
the pair (a=c, {1a,3-}). We can now state the following proposition: 


Proposition 6. Let f (ti,...,tn) be a R-absorbing function of support Tr, and 


let t? be a SICt,,2 for some x. Then R (tietr) Nietp tp i8 a SICf,x- 


Proof (sketch of). Appendix C.5 of the companion technical report. 


Previous examples (Sect. 6.1) can be recast in term of R-absorbing function, 
proving their correctness (cf. companion technical report). Note that regarding 
our end-goal, we should accept only R-absorbing functions in QF-T. 


7 Experimental Evaluation 


This section describes the implementation of our method (Sect. 7.1) for bit vectors 
and arrays (ABV), together with experimental evaluation (Sect. 7.2). 


7.1 Implementation 


Our prototype TFML (Taint engine for ForMuLa)? comprises 7 klocs of OCaml. 
Given an input formula in the SMT-LIB format [5] (ABV theory), TFML per- 
forms several normalizations before adding taint information following Algo- 
rithm 1. The process ends with simplifications as taint usually introduces many 
constant values, and a new SMT-LIB formula is output. 


Sharing with Let-Binding. This stage is crucial as it allows to avoid term 
duplication in theorySIc (Algorithm 2, Sect. 5.3, and Proposition 4). We intro- 
duce new names for relevant sub-terms in order to easily share them. 


Simplifications. We perform constant propagation and rewriting (standard 
rules, e.g. x — x |> 0 or x x 1+ x) on both initial and transformed formulas — 
equality is soundly approximated by syntactic equality. 


? http://benjamin.farinier.org/cav2018/. 
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Shadow Arrays. We encode taint constraints over arrays through shadow 
arrays. For each array declared in the formula, we declare a (taint) shadow 
array. The default value for all cells of the shadow array is the taint of the 
original array, and for each value stored (resp. read) in the original array, we 
store (resp. read) the taint of the value in the shadow array. As logical arrays 
are infinite, we cannot constrain all the values contained in the initial shadow 
array. Instead, we rely on a common trick in array theory: we constrain only 
cells corresponding to a relevant read index in the formula. 


Iterative Skolemization. While we have supposed along the paper to work on 
skolemized formulas, we have to be more careful in practice. Indeed, skolemiza- 
tion introduce dependencies between a skolemized variable and all its preceding 
universally quantified variables, blurring our analysis and likely resulting in con- 
sidering the whole formula as dependent. Instead, we follow an iterative process: 
1. Skolemize the first block of existentially quantified variables; 2. Compute the 
independence condition for any targeted variable in the first block of universal 
quantifiers and remove these quantifiers; 3. Repeat. This results in full Skolemiza- 
tion together with the construction of an independence condition, while avoiding 
many unnecessary dependencies. 


7.2 Evaluation 


Objective. We experimentally evaluate the following research questions: RQ1 
How does our approach perform with regard to state-of-the-art approaches for 
model generation of quantified formulas? RQ2 How effective is it at lifting 
quantifier-free solvers into (SAT-only) quantified solvers? RQ3 How efficient is 
it in terms of preprocessing time and formula size overhead? We evaluate our 
method on a set of formulas combining arrays and bitvectors (paramount in 
software verification), against state-of-the-art solvers for these theories. 


Protocol. The experimental setup below runs on an Intel(R) Xeon(R) E5-2660 
v3 @ 2.60 GHz, 4GB RAM per process, and a TIMEOUT of 1000s per formula. 


Table 1. Answers and resolution time (in seconds, include TIMEOUT) 


Boolector®|CVC4|\CVC4®|CVC45|CVC4 28 Z3 Z3® Z3p|Z3g° 
SMT-LIB|# sat 399 84 |242 84 |242 261 |366 87/366 
UNSAT N/A O|N/A 0 |N/A 165 |N/A O|N/A 
UNKNOWN |870 1185 |1027 1185 |1027 843 |903 |1182/903 
Total time (349 165 |194667| 165 |196934 (270 150/36 480) 192/41 935 
BINSEC  |## SAT 1042 951 |954 951 |954 953 |1042 | 953/1042 
UNSAT N/A 62 |N/A 62 |N/A 319 |N/A 62)N/A 
UNKNOWN 379 408 |467 408 |467 149 |379 406|379 
Total time 1152 64 761/76 811 |64772 |77009 30 235/11 415) 135/11 604 


Solver®: solver enhanced with our method. Z37, CVC4z: essentially E-matching 
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Metrics. For RQ1 we compare the number of SAT and UNKNOWN answers 
between solvers supporting quantification, with and without our approach. 
For RQ2, we compare the number of SAT and UNKNOWN answers between 
quantifier-free solvers enhanced by our approach and solvers supporting quan- 
tification. For RQ3, we measure preprocessing time and formulas size over- 
head. 

Benchmarks. We consider two sets of ABV formulas. First, a set of 1421 for- 
mulas from (a modified version of) the symbolic execution tool BINSEC [12] 
representing quantified reachability queries (cf. Sect.2) over BINSEC bench- 
mark programs (security challenges, e.g. crackme or vulnerability finding). 
The initial (array) memory is quantified so that models depend only on user 
input. Second, a set of 1269 ABV formulas generated from formulas of the 
QF-ABV category of SMT-LIB [5] — sub-categories brummayerbiere, dwp 
formulas and klee selected. The generation process consists in universally 
quantifying some of the initial array variables, mimicking quantified reacha- 
bility problems. 

Competitors. For RQ1, we compete against the two state-of-the-art SMT 
solvers for quantified formulas CVC4 [4] (finite model instantiation [31]) and 
Z3 [14] (model-based instantiation [20]). We also consider degraded versions 
CVC4g and Z3g that roughly represent standard E-matching [16]. For RQ2 
we use Boolector [10], one of the very best QF-ABV solvers. 


Table 2. Complementarity of our approach with existing solvers (SAT instances) 


CVC4e Z390 Boolector® 
SMT-LIB | CVC4 | —10 | +168 | [252] —10 +325 [409] 
Z3 —119 | +224 | [485] | —86 | +224 | [485] 
BINsEC | CVC4| —25 | +28 | [979] —25 | +116 | [1067] 
Z3 —25 | +114 | [1067] —25 +114 | [1067] 
Results. Tables 1 and 2 and Fig. 3 sum up our experimental results, which have 


all been cross-checked for consistency. Table 1 reports the number of successes 
(SAT or UNSAT) and failures (UNKNOWN), plus total solving times. The è sign 
indicates formulas preprocessed with our approach. In that case it is impossible 
to correctly answer UNSAT (no WIC checking), the UNSAT line is thus N/A. Since 
Boolector does not support quantified ABV formulas, we only give results with 
our approach enabled. Table 1 reads as follow: of the 1269 SMT-LIB formulas, 
standalone Z3 solves 426 formulas (261 SAT, 165 UNSAT), and 366 (all sAT) if 
preprocessed. Interestingly, our approach always improves the underlying solver 
in terms of solved (SAT) instances, either in a significant way (SMT-LIB) or in 
a modest way (BINSEC). Yet, recall that in a software verification setting every 
win matters (possibly new bug found or new assertion proved). For Z3e, it also 
strongly reduces computation time. Last but not least, Boolectore (a pure QF- 
solver) turns out to have the best performance on SAT-instances, beating state- 
of-the-art approaches both in terms of solved instances and computation time. 
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Table2 substantiates the comple- 


8 F — y = 9.33" 

mentarity of the different methods, E i i 

and reads as follow: for SMT-LIB, = 10r 

Boolectore solves 224 (SAT) formu- $ 10°F 

las missed by Z3, while Z3 solves 86 = 10° | 

(SAT) formulas missed by Boolectore, E il 

and 485 (SAT) formulas are solved by So a? 

either one of them. ll 


Figure3 shows formula size aver- 10? 10-10" 10° 10° 10 

: : P Original file size (bits) 
aging a 9-fold increase (min 3, max 
12): yet they are easier to solve 
because they are more constrained. 
Regarding performance and overhead 
of the tainting process, taint time is 
almost always less than 1s in our 
experiments (not shown here), 4min 
for worst case, clearly dominated by 
resolution time. The worst case is due 
to a pass of linearithmic complexity which can be optimized to be logarithmic. 


Maximal size ratio 12.48 
Minimal size ratio 2.81 
Average size ratio 8.73 
Standard deviation 0.78 


Fig. 3. Overhead in formula size 


Pearls. We show hereafter two particular applications of our method. Table 3 
reports results of another symbolic execution experiment, on the grub example. 
On this example, Boolector® completely out- 


performs existing approaches. As a second Table 3. GRUB example 
application, while the main drawback of our Boolector®|Z3 
method is that it precludes proving UNSAT, #lsar Bæ O ıı 
this is easily mitigated by complementing unsar |N/A 42 
the approach with another one geared (or VNENOWA eee ae 
Total time |16 732 159 765 


able) to proving UNSAT, yielding efficient 
solvers for quantified formulas, as shown in 
Table 4. 


Conclusion. Experiments demonstrate the 


: j Table 4. Best approaches 
relevance of our taint-based technique for EF 


model generation. (RQ1) Results in Table 1 Former|New 

shows that our approach greatly facilitates Z3 Be [Be > Z3 

the resolution process. On these examples, SMT-LIB|sar 261 |399 | 485 
UNSAT 165|N/A| 165 


our method performs better than state-of- 
the-art solvers but also strongly complements 


UNKNOWN 843 |870 619 
Time 270 150/350 |94610 


them (Table2). (RQ2) Moreover, Tablel Busko sat 353 1042| 1067 
demonstrates that our technique is highly UNSAT 319|N/A| 319 
effective at lifting quantifier-free solvers to UNKNOWN] 149 |379 35 


quantified formulas, in both number of SAT = a es LL69 


answers and computation time. Indeed, once 

lifted, Boolector performs better (for SAT-only) than Z3 or CVC4 with full quan- 
tifier support. Finally (RQ8) our tainting method itself is very efficient both in 
time and space, making it perfect either for a preprocessing step or for a deeper 
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integration into a solver. In our current prototype implementation, we consider 
the cost to be low. The companion technical report contains a few additional 
experiments on bitvectors and integer arithmetic, including the example from 
Fig. 1. 


8 Related Work 


Traditional approaches to solving quantified formulas essentially involve either 
generic methods geared to proving unsatisfiability and validity [16], or complete 
but dedicated approaches for particular theories [8,36]. Besides, some recent 
methods [20, 22,31] aim to be correct and complete for larger classes of theories. 


Generic Method for Unsatisfiability. Broadly speaking, these methods 
iteratively instantiate axioms until a contradiction is found. They are generic 
w.r.t. the underlying theory and allow to reuse standard theory solvers, but 
termination is not guaranteed. Also, they are more suited to prove unsatisfia- 
bility than to find models. In this family, E-matching [13,16] shows reasonable 
cost when combined with conflict-based instantiation [30] or semantic triggers 
[17,18]. In pure first-order logic (without theories), quantifiers are mainly han- 
dled through resolution and superposition [1,26] as done in Vampire [24,33] and 
E [34]. 


Complete Methods for Specific Theories. Much work has been done on 
designing complete decision procedures for quantified theories of interest, notably 
array properties [8], quantified theory of bitvectors [23,36], Presburger arithmetic 
or Real Linear Arithmetic [9,19]. Yet, they usually come at a high cost. 


Generic Methods for Model Generation. Some recent works detail 
attempts at more general approaches to model generation. 

Local theory extensions [2,22] provide means to extend some decidable theo- 
ries with free symbols and quantifications, retaining decidability. The approach 
identifies specific forms of formulas and quantifications (bounded), such that 
these theory extensions can be solved using finite instantiation of quantifiers 
together with a decision procedure for the original theory. The main drawback 
is that the formula size can increase a lot. 

Model-based quantifier instantiation is an active area of research notably 
developed in Z3 and CVC4. The basic line is to consider the partial model under 
construction in order to find the right quantifier instantiations, typically in a try- 
and-refine manner. Depending on the variants, these methods favors either sat- 
isfiability or unsatisfiability. They build on the underlying quantifier-free solver 
and can be mixed with E-matching techniques, yet each refinement yields a 
solver call and the refinement process may not terminate. Ge and de Moura [20] 
study decidable fragments of first-order logic modulo theories for which model- 
based quantifier instantiation yields soundness and refutational completeness. 
Reynolds et al. [30], Barbosa [3] and Preiner et al. [28] use models to guide the 
instantiation process towards instances refuting the current model. Finite model 
quantifier instantiation [31,32] reduces the search to finite models, and is indeed 
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geared toward model generation rather than unsatisfiability. Similar techniques 
have been used in program synthesis [29]. 

We drop support for the unsatisfiable case but get more flexibility: we deal 
with quantifiers on any sort, the approach terminates and is lightweight, in the 
sense that it requires a single call to the underlying quantifier-free solver. 


Other. Our method can be seen as taking inspiration from program taint anal- 
ysis [15,27] developed for checking the non-interference [35] of public and secrete 
input in security-sensitive programs. As far as the analogy goes, our approach 
should not be seen as checking non-interference, but rather as inferring precon- 
ditions of non-interference. Moreover, our formula-tainting technique is closer 
to dynamic program-tainting than to static program-tainting, in the sense that 
precise dependency conditions are statically inserted at preprocess-time, then 
precisely explored at solving-time. 

Finally, Darvas et al. [11] presents a bottom-up formula strengthening 
method. Their goal differ from ours, as they are interested in formula well- 
definedness (rather than independence) and validity (rather than model genera- 
tion). 


9 Conclusion 


This paper addresses the problem of generating models of quantified first-order 
formulas over built-in theories. We propose a correct and generic approach based 
on a reduction to the quantifier-free case through the inference of independence 
conditions. The technique is applicable to any theory with a decidable quantifier- 
free case and allows to reuse all the work done on quantifier-free solvers. The 
method significantly enhances the performances of state-of-the-art SMT solvers 
for the quantified case, and supplements the latest advances in the field. 

Future developments aim to tackle the definition of more precise inference 
mechanisms of independence conditions, the identification of interesting sub- 
classes for which inferring weakest independence conditions is feasible, and the 
combination with other quantifier instantiation techniques. 
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Abstract. We present POS, a concurrency testing approach that sam- 
ples the partial order of concurrent programs. POS uses a novel priority- 
based scheduling algorithm that dynamically reassigns priorities regard- 
ing the partial order information and formally ensures that each par- 
tial order will be explored with significant probability. POS is simple 
to implement and provides a probabilistic guarantee of error detection 
better than state-of-the-art sampling approaches. Evaluations show that 
POS is effective in covering the partial-order space of micro-benchmarks 
and finding concurrency bugs in real-world programs, such as Firefox’s 
JavaScript engine SpiderMonkey. 


1 Introduction 


Concurrent programs are notoriously difficult to test. Executions of different 
threads can interleave arbitrarily, and any such interleaving may trigger unex- 
pected errors and lead to serious production failures [13]. Traditional testing over 
concurrent programs relies on the system scheduler to interleave executions (or 
events) and is limited to detect bugs because some interleavings are repeatedly 
tested while missing many others. 

Systematic testing [9,16,18,28-30], instead of relying on the system sched- 
uler, utilizes formal methods to systematically schedule concurrent events and 
attempt to cover all possible interleavings. However, the interleaving space of 
concurrent programs is exponential to the execution length and often far exceeds 
the testing budget, leading to the so-called state-space explosion problem. Tech- 
niques such as partial order reduction (POR) [1,2,8,10] and dynamic interface 
reduction [11] have been introduced to reduce the interleaving space. But, in most 
cases, the reduced space of a complex concurrent program is still too large to 
test exhaustively. Moreover, systematic testing often uses a deterministic search 
algorithm (e.g., the depth-first search) that only slightly adjusts the interleaving 
at each iteration, e.g., flip the order of two events. Such a search may very well 
get stuck in a homogeneous interleaving subspace and waste the testing budget 
by exploring mostly equivalent interleavings. 

To mitigate the state-space explosion problem, randomized scheduling algo- 
rithms are proposed to sample, rather than enumerating, the interleaving space 
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Thread A Thread B Thread A Thread B 
+->assert (x==0) ; step(1); A1: x++; Bi: y--3 
| pias barrier (); barrier (); 
| oe A2: x--; B2: x = 0; 
| step(m-1); A3: ytt; B3: y = 1; 
fo rere sees eee x = 1; 
(a) (b) 


Fig. 1. (a) An example illustrating random walk’s weakness in probabilistic guarantee 
of error detection, where variable x is initially 0; (b) An example illustrating PCT’s 
redundancy in exploring the partial order. 


while still keeping the diversity of the interleavings explored [28]. The most 
straightforward sampling algorithm is random walk: at each step, randomly pick 
an enabled event to execute. Previous work showed that even such a sampling 
outperformed the exhaustive search at finding errors in real-world concurrent 
programs [24]. This can be explained by applying the small-scope hypothesis |12, 
Sect. 5.1.3] to the domain of concurrency error detection [17]: errors in real-world 
concurrent programs are non-adversarial and can often be triggered if a small 
number of events happen in the right order, which sampling has a good proba- 
bility to achieve. 

Random walk, however, has a unsurprisingly poor probabilistic guarantee 
of error detection. Consider the program in Fig. 1a. The assertion of thread A 
fails if, and only if, the statement “x = 1” of thread B is executed before this 
assertion. Without knowing which order (between the assertion and “x = 1”) 
triggers this failure as a priori, we should sample both orders uniformly because 
the probabilistic guarantee of detecting this error is the minimum sampling 
probability of these two orders. Unfortunately, random walk may yield extremely 
non-uniform sampling probabilities for different orders when only a small number 
of events matter. In this example, to trigger the failure, the assertion of thread 
A has to be delayed (or not picked) by m times in random walk, making its 
probabilistic guarantee as low as 1/2”. 

To sample different orders more uniformly, Probabilistic Concurrency Testing 
(PCT) [4] depends on a user-provided parameter d, the number of events to 
delay, to randomly pick d events within the execution, and inserts a preemption 
before each of the d events. Since the events are picked randomly by PCT, the 
corresponding interleaving space is sampled more uniformly, resulting in a much 
stronger probabilistic guarantee than random walk. Consider the program in 
Fig. la again. To trigger the failure, there is no event needed to be delayed, other 
than having the right thread (i.e. thread B) to run first. Thus, the probability 
trigger (or avoid) the failure is 1/2, which is much higher than 1/2”. 

However, PCT does not consider the partial order of events entailed by a con- 
current program, such that the explored interleavings are still quite redundant. 
Consider the example in Fig. 1b. Both A1 and B1 are executed before the barrier 
and do not race with any statement. Statements A2 and B2 form a race, and so 
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do statements A3 and B3. Depending on how each race is resolved, the program 
events have total four different partial orders. However, without considering the 
effects of barriers, PCT will attempt to delay A1 or B1 in vain. Furthermore, 
without considering the race condition, PCT may first test an interleaving A2 — 
A3 — B2 — B3 (by delaying A3 and B2), and then test a partial-order equivalent 
and thus completely redundant interleaving A2 — B2 —> A3 — B3 (by delaying 
A3 and B3). Such redundancies in PCT waste testing resources and weaken the 
probabilistic guarantee. 

Towards addressing the above challenges, this paper makes three main con- 
tributions. First, we present a concurrency testing approach, named partial order 
sampling (POS), that samples the concurrent program execution based on the 
partial orders and provides strong probabilistic guarantees of error detection. In 
contrast to the sophisticated algorithms and heavy bookkeeping used in prior 
POR work, the core algorithm of POS is much more straightforward. In POS, 
each event is assigned with a random priority and, at each step, the event with 
the highest priority is executed. After each execution, all events that race with 
the executed event will be reassigned with a fresh random priority. Since each 
event has its own priority, POS (1) samples the orders of a group of dependent 
events uniformly and (2) uses one execution to sample independent event groups 
in parallel, both benefiting its probabilistic guarantee. The priority reassignment 
is also critical. Consider racing events e; and e2, and an initial priority assign- 
ment that runs e; first. Without the priority reassignment, e2 may very well be 
delayed again when a new racing event e3 occurs because e€2’s priority is more 
likely to be small (the reason that e2 is delayed after e; at the first place). Such 
priority reassignments ensure that POS samples the two orders of e2 and e3 
uniformly. 

Secondly, the probabilistic guarantee of POS has been formally analyzed and 
shown to be exponentially stronger than random walk and PCT for general pro- 
grams. The probability for POS to execute any partial order can be calculated 
by modeling the ordering constraints as a bipartite graph and computing the 
probability that these constraints can be satisfied by a random priority assign- 
ment. Although prior POR work typically have soundness proofs of the space 
reduction [1,8], those proofs depend on an exhaustive searching strategy and it 
is unclear how they can be adapted to randomized algorithms. Some random- 
ized algorithms leverage POR to heuristically avoid redundant exploration, but 
no formal analysis of their probabilistic guarantee is given [22,28]. To the best 
of our knowledge, POS is the first work to sample partial orders with formal 
probabilistic guarantee of error detection. 

Lastly, POS has been implemented and evaluated using both randomly gen- 
erated programs and real-world concurrent software such as Firefox’s JavaScript 
engine SpiderMonkey in SCTBench [24]. Our POS implementation supports 
shared-memory multithreaded programs using Pthreads. The evaluation results 
show that POS provided 134.1x stronger overall guarantees than random walk 
and PCT on randomly generated programs, and the error detection is 2.6 x faster 
than random walk and PCT on SCTBench. POS managed to find the six most 
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difficult bugs in SCTBench with the highest probability among all algorithms 
evaluated and performed the best among 20 of the total 32 non-trivial bugs in 
our evaluation. 


Related Work. There is a rich literature of concurrency testing. Systematic 
testing [9, 14, 18,28] exhaustively enumerates all possible schedules of a program, 
which suffers from the state-space explosion problem. Partial order reduction 
techniques [1,2,8, 10] alleviate this problem by avoiding exploring schedules that 
are redundant under partial order equivalence but rely on bookkeeping the mas- 
sive exploration history to identify redundancy and it is unclear how they can 
be applied to the sampling methods. 

PCT [4] explores schedules containing orderings of small sets of events and 
guarantees probabilistic coverage of finding bugs involving rare orders of a small 
number of events. PCT, however, does not take partial orders into account and 
becomes ineffective when dealing with a large number of ordering events. Also, 
the need of user-provided parameters diminishes the coverage guarantee, as the 
parameters are often provided imprecisely. Chistikov et al. [5] introduced hit- 
ting families to cover all admissible total orders of a set of events. However, this 
approach may cover redundant total orders that correspond to the same partial 
order. RAPOS [22] leverages the ideas from the partial order reduction, resem- 
bling our work in its goal, but does not provide a formal proof for its probabilistic 
guarantee. Our micro-benchmarks show that POS has a 5.0x overall advantage 
over RAPOS (see Sect. 6.1). 

Coverage-driven concurrency testing [26,32] leverages relaxed coverage met- 
rics to discover rarely explored interleavings. Directed testing [21,23] focuses on 
exploring specific types of interleavings, such as data races and atomicity viola- 
tions, to reveal bugs. There is a large body of other work showing how to detect 
concurrency bugs using static analysis [19,25] or dynamic analysis [7,15,20]. But 
none of them can be effectively applied to real-world software systems, while still 
have formal probabilistic guarantees. 


2 Running Example 


Figure 2 shows the running example of this paper. In this example, we assume 
that memory accesses are sequentially consistent and all shared variables (e.g., 
x, w, etc.) are initialized to be 0. The program consists of two threads, i.e., A and 
B. Thread B will be blocked at B4 by wait(w) until w > 0. Thread A will set w 
to be 1 at A3 via signal (w) and unblock thread B. The assertion at A4 will fail 
if, and only if, the program is executed in the following total order: 


B1 — Ai — B2 — B3 — A2 — A3 — B4 — B5 — B6 — A4 


To detect this bug, random walk has to make the correct choice at every step. 
Among all ten steps, three of them only have a single option: A2 and A3 must 
be executed first to enable B4, and A4 is the only statement left at the last step. 
Thus, the probability of reaching the bug is 1/2’ = 1/128. As for PCT, we have 
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global int x =y=z=w#=0; 


Thread A Thread B 


A1: x++; B1: x = 1; 
A2: y++; B2: a x; 
A3: signal (w) ; B3: y = a; 
A4: assert(z < 5); B4: wait(w); 
BS: b = y; 


B6: z = a + b; 


Fig. 2. The running example involving two threads. 


to insert two preemption points just before statements B2 and A2 among ten 
statements, thus the probability for PCT is 1/10 x 1/10 x 1/2 = 1/200, where 
this 1/2 comes from the requirement that thread B has to be executed first. 

In POS, this bug can be detected with a substantial probability of 1/48, 
much higher than other approaches. Indeed, our formal guarantees ensure that 
any behavior of this program can be covered with a probability of at least 1/60. 


3 Preliminary 


Concurrent Machine Model. Our concurrent abstract machine models a 
finite set of processes and a set of shared objects. The machine state is denoted 
as s, which consists of the local state of each process and the state of shared 
objects. The abstract machine assumes the sequential consistency and allows 
the arbitrary interleaving among all processes. At each step, starting from s, 
any running process can be randomly selected to make a move to update the 
state to s’ and generate an event e, denoted as s © s’. 

An event e is a tuple e := (pid, intr, obj, ind), where pid is the process ID, 
intr is the statement (or instruction) pointer, obj is the shared object accessed 
by this step (we assume each statement only access at most a single shared 
object), and ind indicates how many times this intr has been executed and is 
used to distinguish different runs of the same instruction. For example, the exe- 
cution of the statement “A2: y++” in Fig. 2 will generate the event (A, A2, y, 0). 
Such an event captures the information of the corresponding step and can be 
used to replay the execution. In other words, given the starting state s and the 
event e, the resulting state s’ of a step “S” is determined. 

A trace t is a list of events generated by a sequence of program transitions 
(or steps) starting from the initial machine state (denoted as sọ). For example, 
the following program execution: 


eo e1 En 
SO > S1 Aa > Sn+1 


generates the trace t := eo °Ħ€1° -++ ° en, where the symbol “e” means“cons-ing” 
an event to the trace. Trace events can be accessed by index (e.g., t[1] = e1). 
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A trace can be used to replay a sequence of executions. In other words, given 
the initial machine state so and the trace t, the resulting state of running t 
(denoted as “State(t)”) is determined. 

We write En(s) := {e | ds’, s & s’} as the set of events enabled (or allowed to 
be executed) at state s. Take the program in Fig. 2 as an example. Initially, both 
A1 and B1 can be executed, and the corresponding two events form the enabled 
set En(s9). The blocking wait at B4, however, can be enabled only after being 
signaled at A3. A state s is called a terminating state if, and only if, En(s) = 0. 
We assume that any disabled event will eventually become enabled and every 
process must end with either a terminating state or an error state. This indicates 
that all traces are finite. For readability, we often abbreviate En(State(t)), i.e., 
the enabled event set after executing trace t, as En(t). 


Partial Order of Traces. Two events ep and e; are called independent events 
(denoted as egLe,) if, and only if, they neither belong to the same process nor 
access the same object: 


eoLe := (e9.pid Æ e1.pid) A (e9.0obj Æ e1.0bj) 


The execution order of independent events does not affect the resulting state. Ifa 
trace t can be generated by swapping adjacent and independent events of another 
trace t’, then these two traces t and t are partial order equivalent. Intuitively, 
partial order equivalent traces are guaranteed to lead the program to the same 
state. The partial order of a trace is characterized by the orders between all 
dependent events plus their transitive closure. Given a trace t, its partial order 
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relation “C+” is defined as the minimal relation over its events that satisfies: 


(1) Vij, i<j A tle] Lely] = tli] Ce tly] 
(2) Vi j k, tli] Ce tly] A tli] Ce tik] = tli] Ce t[k] 


Two traces with the same partial order relation and the same event set must be 
partial order equivalent. 

Given an event order € and its order relation Cg, we say a trace t follows E 
and write “t ~ €” if, and only if, 


Veo e1, €o Lt C1 = > Cole €1 


We write “t H E” to denote that E is exactly the partial order of trace t: 


tEE:= Veg €1, €o Ci 6&1 — > eo Le €1 


Probabilistic Error-Detection Guarantees. Each partial order of a concur- 
rent program may lead to a different and potentially incorrect outcome. There- 
fore, any possible partial order has to be explored. The minimum probability 
of these explorations are called the probabilistic error-detection guarantee of a 
randomized scheduler. 

Algorithm 1 presents a framework to formally reason about this guarantee. A 
sampling procedure Sample samples a terminating trace t of a program. It starts 
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Algorithm 1. Sample a trace using scheduler Sch and random variable R 


1: procedure Sample(Sch, R) 
t-[] 
while En(t) 4 ý do 
e — Sch(En(t), R) 
t tee 
end while 
return t 
end procedure 


tS 


with the empty trace and repeatedly invokes a randomized scheduler (denoted 
as Sch) to append an event to the trace until the program terminates. The ran- 
domized scheduler Sch selects an enabled event from En(t) and the randomness 
comes from the random variable parameter, i.e., R. 

A naive scheduler can be purely random without any strategy. A sophisti- 
cated scheduler may utilize additional information, such as the properties of the 
current trace and the enabled event set. 

Given the randomized scheduler Sch on R and any partial order € of a pro- 
gram, we write “P(Sample(Sch,R) = E£)” to denote the probability of covering 
E, i.e., generating a trace whose partial order is exactly € using Algorithm 1. The 
probabilistic error-detection guarantee of the scheduler Sch on R is then defined 
as the minimum probability of covering the partial order € of any terminating 
trace of the program: 


min P(Sample(Sch, R) — £) 


4 POS - Algorithm and Analysis 


In this section, we first present BasicPOS, a priority-based scheduler and ana- 
lyze its probability of covering a given partial order (see Sect. 4.1). Based on the 
analysis of BasicPOS, we then show that such a priority-based algorithm can 
be dramatically improved by introducing the priority reassignment, resulting in 
our POS algorithm (see Sect. 4.2). Finally, we present how to calculate the prob- 
abilistic error-detection guarantee of POS on general programs (see Sect. 4.3). 


4.1 BasicPOS 


In BasicPOS, each event is associated with a random and immutable priority, 
and, at each step, the enabled event with the highest priority will be picked to 
execute. We use Pri to denote the map from events to priorities and describe 
BasicPOS in Algorithm 2, which instantiates the random variable R in Algo- 
rithm 1 with Pri. The priority Pri(e) of every event e is independent with each 
other and follows the uniform distribution U/(0, 1). 

We now consider in what condition would BasicPOS sample a trace that 
follows a given partial order € of a program. It means that the generated trace t, 
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Algorithm 2. Sample a trace with BasicPOS under the priority map Pri 


1: procedure Samplegasicpos (Pri) > Pri ~ U(0,1) 
t—[] 
while En(t) 4 0 do 
e* — arg max, cence) Pri(e) 
t — te e* 
end while 
return t 
end procedure 


59 


at the end of each loop iteration (line 5 in Algorithm 2), must satisfy the invariant 
“t ~ E”. Thus, the event priorities have to be properly ordered such that, given 
a trace t satisfies “t ~ £”, the enabled event e* with the highest priority must 
satisfies “tee* ~ £”. In other words, given “t ~ E”, for any e € En(t) and 
“tee & E”, there must be some e’ € En(t) satisfying “tee’ ~ E” and a proper 
priority map where e’ has a higher priority, i.e., Pri(e’) > Pri(e). Thus, e will 
not be selected as the event e* at line 4 in Algorithm 2. The following Lemma 1 
indicates that such an event e’ always exists: 


Lemma 1 


Vte, t~E A e€En(t) A tee HE 
=> de’, e € En(t) A tee’ YE A e' Cee 


Proof. We can prove it by contradiction. Since traces are finite, we assume that 
some traces are counterexamples to the lemma and t is the longest such trace. 
In other words, we have t ~ E and there exists e € En(t) A tee £ E such that: 


Ve’, e € En(t) A tee’ YE = > -(e' Ce e) (1) 


Since E is the partial order of a terminating trace and the traces t has not 
terminated yet, we know that there must exist an event e’ € En(t) such that 
tee’ ~E. Let t' := tee’, by (1), we have that (e’ Ce e) and 


e € En(t’) 
Atec HE 
A Ve”, e” E€ En(t) A thee” xE = > (e Ce e) 


First two statements are intuitive. The third one also holds, otherwise, e’ Ce e 
can be implied by the transitivity of partial orders using e”. Thus, t’ is a coun- 
terexample that is longer than t, contradicting to our assumption. 


Thanks to Lemma 1, we then only need to construct a priority map such that 
this e’ has a higher priority. Let “e œg e' := Jt, t œ~ E A {e,e'} C En(t)” denote 
that e and e’ can be simultaneously enabled under E. We write 


PSe(e):= {e |e Cee A ee e} 
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as the set of events that can be simultaneously enabled with but have to be 
selected prior to e in order to follow £. We have that any e’ specified by Lemma 1 
must belong to PS¢(e). Let Ve be the event set ordered by E. The priority map 
Pri can be constructed as below: 


VAN Pri(e) < Pri(e’) (Cond-BasicPOS) 


e€Ve, e'EPSe (e) 


The traces sampled by BasicPOS using this Pri will always follow £. 

Although (Cond-BasicPOS) is not the necessary condition to sample a trace 
following a desired partial order, from our observation, it gives a good estimation 
for the worst cases. This leads us to locate the major weakness of BasicPOS: 
the constraint propagation of priorities. An event e with a large PS¢(e) set may 
have a relatively low priority since its priority has to be lower than all the events 
in PSe (e). Thus, for any simultaneously enabled event e’ that has to be delayed 
after e, Pri(e’) must be even smaller than Pri(e), which is unnecessarily hard to 
satisfy for a random Pri(e’). Due to this constraints propagation, the probability 
that a priority map Pri satisfies (Cond-BasicPOS) can be as low as 1/|Ve|!. 


Here, we explain how BasicPOS samples the following trace that triggers 
the bug described in Sect. 2: 


thug := (B, B1, x, 0) ° (A, A1, x, 0) ° (B, B2, x, 0) ° (B, B3, y, 0) • (A, A2, y, 0) 
° (A, A3, W, 0) ° (B, B4, W, 0) s (B, B5, y, 0) $ (B, B6, Z, 0) e (A, A4, Z, 0) 


To sample trace tug, according to (Cond-BasicPOS), the priority map has 
to satisfy the following constraints: 


Li(toug[0] = (B,B1,x,0)) > Pri(toug[l] = (A, A1,x,0)) 
Li(toug > Pri(toug = (B, B2,x,0)) 
Li(toug > Pri(tbug = (A, A2,y,0)) 
(toug ) 
Li(toug = > Pri(toug = (A, A4,z,0)) 
( 
( 


Li(toug = > Pri(toug[9]) 
Li(toug > Pri(toug[9]) 


( 
( 
( 
Li(toug = > Pri 
( 
( 
( 


Note that these are also the necessary constraints for BasicPOS to follow 
the partial order of tbug. The probability that a random Pri satisfies the 
constraints is 1/120. The propagation of the constraints can be illustrated 
by the first three steps: 


Pri(toug[0]) > Pri(toug(L]) > Pri(toug[2]) > Pri(toug[4]) 


that happens in the probability of 1/24. However, on the other hand, ran- 
dom walk can sample these three steps in the probability of 1/8. 
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4.2 POS 


We will now show how to improve BasicPOS by eliminating the propagation of 
priority constraints. Consider the situation when an event e (delayed at some 
trace t) becomes eligible to schedule right after scheduling some e’, i.e., 


t~E A {e,e} CEn(t) A tee HE A tedeexE 


If we reset the priority of e right after scheduling e’, all the constraints causing 
the delay of e will not be propagated to the event e” such that e € PS¢(e”). 
However, there is no way for us to know which e should be reset after e’ during 
the sampling, since € is unknown and not provided. Notice that 


t~E A {e,e'} CEn(t) A tee LE A teese x E = e.obj = e'.obj 


If we reset the priority of all the events that access the same object with e’, the 
propagation of priority constraints will also be eliminated. 

To analyze how POS works to follow € under the reassignment scheme, we 
have to model how many priorities need to be reset at each step. Note that 
blindly reassigning priorities of all delayed events at each step would be sub- 
optimal, which degenerates the algorithm to random walk. To give a formal and 
more precise analysis, we introduce the object index functions for trace t and 
partial order £: 


I(t,e):= |{e’ |e Et A e.obj = e'.obj} | 
Ie(e):= |{e' |e’ Cee A e.obj = e'.obj} | 


Intuitively, when e € En(t), scheduling e on t will operate e.obj after I(t, e) 
previous events. A trace t follows € if every step (indicated by ¢t[i]) operates the 
object t{i].obj after Ie(t|i]) previous events in the trace. 

We then index (or version) the priority of event e using the index function 
as Pri(e, I(t,e)) and introduce POS shown in Algorithm 3. By proving that 


Ve’, I(t,e) < I(tee’,e) A (I(t,e) =I(tee’,e) => e.obj £ e'.obj) 
we have that scheduling an event e will increase the priority version of all the 
events accessing e.obj, resulting in the priority reassignment. 
We can then prove that the following statements hold: 

Vte, t~EAe€ En(t) = > (tee xE 4> I(t,e) =Te(e)) 

Vte, t~EANeEEn(t)Atee LE => I(t,e) < Ie(e) 
To ensure that the selection of e* on trace t follows £ at the line 4 of Algorithm 3, 
any e satisfying I(t,e) < Ie¢(e) has to have a smaller priority than some e’ 


satisfying I(t,e’) = Ig(e) and such e’ must exist by Lemma 1. In this way, the 
priority constraints for POS to sample € are as below: 


N Pri(e,i) < Pri(e', Ie(e')) for some i < I¢(e) 


which is bipartite and the propagation of priority constraints is eliminated. The 
effectiveness of POS is guaranteed by Theorem 1. 
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Algorithm 3. Sample a trace with POS under versioned priority map Pri 


1: procedure Samplepog (Pri) > Pri ~ U(0, 1) 
t—[] 
while En(t) 4 ý do 
e* — arg max. cent) Pri(e, I(t, e)) 
t<t.e* 
end while 
return t 
end procedure 


tS 


Theorem 1. Given any partial order E of a program with P > 1 processes. Let 


De :=|{(e,e)|eCeed Neke A ere e’}| 


be the number of races in E, we have that 


1. De < |Ve| x (P — 1), and 
2. POS has at least the following probability to sample a trace t ~ E: 


[Vel 
1 
_ RY 
(>) 


where R =P x |Vel/(|Ve| + De) > 1 and U = (|Vel — [De /(P — 1)])/2 > 0 


Please refer to the technical report [33] for the detailed proof and the construction 
of priority constraints. 
Here, we show how POS improves BasicPOS over the example in Sect. 2. 
The priority constraints for POS to sample the partial order of tbug are as 
below: 


Pri(toug|8] ,0) > Pri(toug[9 


Since each Pri(e,7) is independently random following U/(0,1), the proba- 
bility of Pri satisfying the constraints is 1/2 x 1/2 x 1/3 x 1/4 = 1/48. 


4.3 Probability Guarantee of POS on General Programs 


We now analyze how POS performs on general programs compared to random 
walk and PCT. Consider a program with P processes and N total events. It is 
generally common for a program have substantial non-racing events, for exam- 
ple, accessing shared variables protected by locks, semaphores, and condition 
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variables, etc. We assume that there exists a ratio 0 < œ < 1 such that in any 
partial order there are at least aM non-racing events. 

Under this assumption, for random walk, we can construct an adversary 
program with the worst case probability as 1/P% for almost any a [33]. For 
PCT, since only the order of the (1 — a) events may affect the partial order, 
the number of preemptions needed for a partial order in the worst case becomes 
(1 — a)N, and thus the worst case probability bound is 1/MC-%%’, For POS, 
the number of races Dg is reduced to (1 — a)N x (P — 1) in the worst case, 
Theorem 1 guarantees the probability lower bound as 


pa ¢ =a : a) 


Thus, POS advantages random walk when a > 0 and degenerates to random 
walk when a = 0. Also, POS advantages PCT if N > P (when a = 0) or 
NiVe-1 > Pt/%,/1+a/P — a (when 0 < a < 1). For example, when P = 2 
and a = 1/2, POS advantages PCT if M > 2v3. In other words, in this case, 
POS is better than PCT if there are at least four total events. 


5 Implementation 


The algorithm of POS requires a pre-determined priority map, while the imple- 
mentation could decide the event priority on demand when new events appear. 
The implementation of POS is shown in Algorithm 4, where lines 14-18 are for 
the priority reassignment. Variable s represents the current program state with 
the following interfaces: 


— s.Enabled() returns the current set of enabled events. 
— s.Execute(e) returns the resulting state after executing e in the state of s. 
— s.IsRacing(e, e’) returns if there is a race between e and e’. 


In the algorithm, if a race is detected during the scheduling, the priority of the 
delayed event in the race will be removed and then be reassigned at lines 6-9. 


Relaxation for Read-Only Events. The abstract interface s.IsRacing(...) allows 
us to relax our model for read-only events. When both e and e’ are read-only 
events, s.IsRacing(e, e’) returns false even if they are accessing the same object. 
Our evaluations show that this relaxation improves the execution time of POS. 


Fairness Workaround. POS is probabilistically fair. For an enabled event e with 
priority p > 0, the cumulative probability for e to delay by k — co steps without 
racing is at most (l—p”)* — 0. However, it is possible that POS delays events for 
prolonged time, slowing down the test. To alleviate this, the current implemen- 
tation resets all event priorities for every 10° voluntary context switch events, 
e.g., sched_yield() calls. This is only useful for speeding up few benchmark 
programs that have busy loops (sched_yield( calls were added by SCTBench 
creators) and has minimal impact on the probability of hitting bugs. 
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Algorithm 4. Testing a program with POS 


1: procedure POS(s) > s: the initial state of the program 
2: pri — [€++ —oo] > Initially, no priority is assigned except the special symbol e 
3 while s.Enabled() 4 Ø do 
4: e oe > Assume c€ ¢ s.Enabled() 
5: for each e € s.Enabled() do 
6: if e ¢ pri then 
7 newPriority — U(0, 1) 
8: pri — prife œ newPriority] 
9: end if 
10: if pri(e*) < pri(e) then 
11: ee 
12: end if 
13: end for 
14: for each e € s.Enabled() do > Update priorities 
15: if e #e* A s.IsRacing(e, e*) then 
16: pri — pri \ {e} > The priority will be reassigned in the next step 
17: end if 
18: end for 
19: s — s.Execute(e*) 
20: end while 
21: return s 


22: end procedure 


6 Evaluation 


To understand the performance of POS and compare with other sampling meth- 
ods, we conducted experiments on both micro benchmarks (automatically gen- 
erated) and macro benchmarks (including real-world programs). 


6.1 Micro Benchmark 


We generated programs with a small number of static events as the micro bench- 
marks. We assumed multi-threaded programs with t threads and each thread 
executes m events accessing o objects. To make the program space tractable, 
we chose t = m = o = 4, resulting 16 total events. To simulate different object 
access patterns in real programs, we chose to randomly distribute events access- 
ing different objects with the following configurations: 


— Each object has respectively {4,4,4,4} accessing events. (Uniform) 
— Each object has respectively {2,2,6,6} accessing events. (Skewed) 


The results are shown in Table 1. The benchmark columns show the char- 
acteristics of each generated program, including (1) the configuration used for 
generating the program; (2) the number of distinct partial orders in the program; 
(3) the maximum number of preemptions needed for covering all partial orders; 
and (4) the maximum number of races in any partial order. We measured the 
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Table 1. Coverage on the micro benchmark programs. Columns under “benchmark” 
are program characteristics explained in Sect. 6.1. “O(a)” represents incomplete cover- 
age. 


Benchmark Coverage 
Conf. PO. |Max Max RW PCT RAPOS | BasicPOS POS 
count | prempt. | races 
Uniform| 4478 6 19|2.65e—08| 0(4390) 1.84e—06| 0(4475) | 7.94e—06 
7413 6 20|3.97e—08| 0(7257) 3.00e—07| 2.00e—08 | 5.62e—06 
1554 5 19|8.37e—08| 0(1540) 1.78e—06) 4.00e—08 | 8.54e—06 
6289 6 20}1.99e—08| 0(6077) 1.34e—06|] 0(6288) |6.62e—06 
1416 6 21]1.88e—07| 0(1364) 1.99e—05| 1.80e—07 | 4.21e—05 
Skewed | 39078 7 27|5.89e—09| 0(33074) 0(39044)) 0(38857) 1.20e—07 
19706 7 24|4.97e—09| 0(18570) 0(19703)) 0(19634) 5.00e—07 
19512 6 27|2.35e—08| 0(16749) 1.00e—07| 0(19502) | 1.36e—06 
8820 6 23|6.62e—09| 0(8208) 1.00e—07| 0(8816) | 1.20e—06 
7548 7 25}1.32e—08| 0(7438) 1.30e—06| 2.00e—08 | 3.68e—06 
Geo-mean* 2.14e—08 | 2.00e—08 | 4.1le—07) 2.67e—08 | 2.87e—06 


Table 2. Coverage on the micro benchmark programs - 50% read 


Benchmark Coverage 
Conf. |PO. |Max Max RW PCT) RAPOS|BasicPOS POS POS* 
count |/prempt. |races 
Uniform| 896 6 16/7.06e—08 0(883)|9.42e—06| 2.00e—08|9.32e—06|1.41e—05 
1215 6 18)3.53e—08) 0(1204) 8.70e—06} 6.00e—08]1.22e—05|1.51e—05 
1571 Ti 17/8.83e—09 0(1523)|4.22e—06| 0(1566)|7.66e—06|1.09e—05 
3079 6 15)1.99e—08) 0(3064) 8.20e—07| 1.20e—07|7.08e—06|7.68e—06 
1041 4 18 2.51e—07| 0(1032)|/3.05e—05| 2.20e—06|3.32e—05|4.85e—05 
Skewed | 3867 6 19/6.62e—09 0(3733) 1.24e—06} 8.00e—08/4.04e—06|4.24e—06 
1057 6 20/2.12e—07, 0(1055)|4.68e—06| 2.08e—06|2.79e—05/2.80e—05 
1919 6 20/2.09e—07| 0(1917) 2.02e—06| 3.80e—07|1.48e—05/1.48e—05 
11148 7 21/4.71le—08 0(10748)|4.00e—08} 0(11128)|1.58e—06)/3.02e—06 
4800 7 19/3.97e—08 0(4421)|5.00e—07| 0(4778)|1.58e—06|4.80e—06 
Geo-mean* 4.77e—08)|2.00e—08 ) 2.14e—06| 1.05e—07|7.82e—06|1.08e—05 


coverage of each sampling method on each program by the minimum hit ratio on 
any partial order of the program. On every program, we ran each sampling meth- 
ods for 5 x 107 times (except for random walk, for which we calculated the exact 
probabilities). If a program was not fully covered by an algorithm within the sam- 
ple limit, the coverage is denoted as “O(a)”, where x is the number of covered 
partial orders. We let PCT sample the exact number of the preemptions needed 
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for each case. We tweaked PCT to improve its coverage by adding a dummy 
event at the beginning of each thread, as otherwise PCT cannot preempt the 
actual first event of each thread. The results show that POS performed the best 
among all algorithms. For each algorithm, we calculated the overall performance 
as the geometric mean of the coverage.! POS overall performed ~7.0x better 
compared to other algorithms (~134.1x excluding RAPOS and BasicPOS). 

To understand our relaxation of read-only events, we generated another set of 
programs with the same configurations, but with half of the events read-only. The 
results are shown in Table 2, where the relaxed algorithm is denoted as POS*. 
Overall, POS* performed roughly ~1.4x as good as POS and ~5.0x better 
compared to other algorithms (~226.4x excluding RAPOS and BasicPOS). 


6.2 Macro Benchmark 


We used SCTBench [24], a collection of concurrency bugs on multi-threaded 
programs, to evaluate POS on practical programs. SCTBench collected 49 con- 
currency bugs from previous parallel workloads [3,27] and concurrency test- 
ing/verification work [4,6,18,21,31]. SCTBench comes with a concurrency test- 
ing tool, Maple [32], which intercepts pthread primitives and shared memory 
accesses, as well as controls their interleaving. When a bug is triggered, it will be 
caught by Maple and reported back. We implemented POS with the relaxation 
of read-only events in Maple. Each sampling method was evaluated in SCTBench 
by the ratio of tries and hits of the bug in each case. For each case, we ran each 
sampling method on it until the number of tries reaches 104. We recorded the 
bug hit count h and the total runs count t, and calculated the ratio as h/t. 
Two cases in SCTBench are not adopted: parsec-2.0-streamcluster2 and 
radbench-bug1. Because neither of the algorithms can hit their bugs once, which 
conflicts with previous results. We strengthened the case safestack-bug1 by 
internally repeating the case for 10* times (and shrunk the run limit to 500). This 
amortizes the per-run overhead of Maple, which could take up to a few seconds. 
We modified PCT to reset for every internal loop. We evaluated variants of PCT 
algorithms of PCT-d, representing PCT with d— 1 preemption points, to reduce 
the disadvantage of a sub-optimal d. The results are shown in Table 3. We ignore 
cases in which all algorithms can hit the bugs with more than half of their tries. 
The cases are sorted based on the minimum hit ratio across algorithms. The 
performance of each algorithm is aggregated by calculating the geometric mean 
of hit ratios? on every case. The best hit ratio for each case is marked as blue. 
The results of macro benchmark experiments can be highlighted as below: 


— Overall, POS performed the best in hitting bugs in SCTBench. The geometric 
mean of POS is ~2.6x better than PCT and ~4.7x better than random walk. 
Because the buggy interleavings in each case are not necessarily the most 


1 For each case that an algorithm does not have the full coverage, we conservatively 
account the coverage as erat into the geometric mean. 

? For each case that an algorithm cannot hit once within the limit, we conservatively 
account the hit ratio as 1/t in the calculation of the geometric mean. 
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difficult ones to sample, POS may not perform overwhelmingly better than 
others, as in micro benchmarks. 

— Among all 32 cases shown in the table, POS performed the best among all 
algorithms in 20 cases, while PCT variants were the best in 10 cases and 
random walk was the best in three cases. 

— POS is able to hit all bugs in SCTBench, while all PCT variants missed one 
case within the limit (and one case with hit ratio of 0.0002), and random walk 
missed three cases (and one case with hit ratio of 0.0003). 


Table 3. Bug hit ratios on macro benchmark programs 


Case RW|PCT-2|/PCT-3]PCT-4|PCT-5|PCT-20] POS 
01 stringbuffer-jdk1.4 |0.0638| 0.0000] 0.0193} 0.0420] 0.0600] 0.0332]0.0833 
02 reorder_10_bad 0.0000] 0.0007} 0.0014] 0.0017] 0.0021} 0.0000)0.0308 
03 reorder_20_bad 0.0000] 0.0015] 0.0027) 0.0040] 0.0043} 0.0021/0.1709 
04 twostage_100_bad 0.0000] 0.0000] 0.0000} 0.0002] 0.0002} 0.0000|0.0047 
05 radbench-bug2 0.0003] 0.0000} 0.0010) 0.0030] 0.0045} 0.0000)0.0418 
06 safestack-bug1x10* |0.0480] 0.0000] 0.0000] 0.0000] 0.0000] 0.0000]0.2440 
07 WSQ 0.0002] 0.0484] 0.0813] 0.1054] 0.1190} 0.1444/0.0497 
08 WSQ-State 0.0092} 0.0003} 0.0015) 0.0017] 0.0019} 0.0146)0.0926 
09 IWSQ-State 0.0643] 0.0006] 0.0040) 0.0073] 0.0121} 0.0618/0.1380 
10 IWSQ 0.0010] 0.0461] 0.0775} 0.0984] 0.1183} 0.1205/0.0500 
11 reorder_5_bad 0.0018] 0.0061} 0.0110) 0.0122] 0.0126} 0.0089)0.0668 
12 queue_bad 0.9999) 0.0068] 0.1415} 0.2621] 0.3511} 0.6176/0.9999 
13 reorder 4 bad 0.0074] 0.0118] 0.0206] 0.0263] 0.0294} 0.0294/0.0795 
14 qsort_mt 0.0097] 0.0117] 0.0239} 0.0328] 0.0398} 0.0937|0.0958 
15 reorder_3_bad 0.0246] 0.0255] 0.0457] 0.0580] 0.0660} 0.0920|0.0997 
16 wronglock_bad 0.3272] 0.0351] 0.0630} 0.0942) 0.1142} 0.2508)0.4227 
17 bluetooth_driver_bad|0.0628] 0.0390] 0.0597] 0.0778] 0.0791] 0.1334/0.0847 
18 radbench-bug6 0.3026] 0.0461} 0.0748) 0.1011] 0.1220} 0.1435)0.2305 
19 wronglock_3_bad 0.3095] 0.0683] 0.1137] 0.1454] 0.1741} 0.2689/0.3625 
20 twostage_bad 0.0806] 0.1213] 0.1959} 0.2448] 0.2804] 0.2579/0.1212 
21 deadlock01_bad 0.3668] 0.0904| 0.1714] 0.2468] 0.3160} 0.8363/0.3315 
22 account_bad 0.1173] 0.2140] 0.1929} 0.1748] 0.1628} 0.1189|0.3367 
23 token_ring bad 0.1245] 0.1367] 0.1717] 0.1923] 0.2021} 0.2171/0.1724 
24 circular_buffer_bad |0.9159] 0.1301] 0.2888} 0.4226] 0.5180] 0.7114/0.9369 
25 carter01_bad 0.4706] 0.1591] 0.2974] 0.4043] 0.5007} 0.9583/0.4999 
26 ctrace-test 0.2380] 0.2755] 0.3342] 0.3459] 0.3453} 0.2099|0.4680 
27 pbzip2-0.9.4 0.3768] 0.2321} 0.2736] 0.3048] 0.3245} 0.3609]0.6268 
28 stack_bad 0.6051) 0.2800] 0.4060} 0.4811] 0.5365] 0.7352/0.6210 
29 lazy01_bad 0.6089] 0.5386] 0.5645] 0.5906] 0.6112} 0.6887/0.3313 
30 streamcluster3 0.3523] 0.4970} 0.5020] 0.4979] 0.5009] 0.4849]0.4421 
31 aget-bug2 0.4961] 0.3993] 0.4691] 0.5036] 0.5285} 0.6117/0.9395 
32 barnes 0.5180] 0.5050] 0.5049] 0.5048] 0.5052} 0.5043/0.4846 
Geo-mean* 0.0380] 0.0213] 0.0459] 0.0604] 0.0692} 0.0694/0.1795 
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7 Conclusion 


We have presented POS, a concurrency testing approach to sample the partial 
order of concurrent programs. POS’s core algorithm is simple and lightweight: 
(1) assign a random priority to each event in a program; (2) repeatedly execute 
the event with the highest priority; and (3) after executing an event, reassign 
its racing events with random priorities. We have formally shown that POS has 
an exponentially stronger probabilistic error-detection guarantee than existing 
randomized scheduling algorithms. Evaluations have shown that POS is effective 
in covering the partial-order space of micro-benchmarks and finding concurrency 
bugs in real-world programs such as Firefox’s JavaScript engine SpiderMonkey. 
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Abstract. We present a method for proving that a program running 
under the Total Store Ordering (TSO) memory model is robust, i.e., all 
its TSO computations are equivalent to computations under the Sequen- 
tial Consistency (SC) semantics. This method is inspired by Lipton’s 
reduction theory for proving atomicity of concurrent programs. For pro- 
grams which are not robust, we introduce an abstraction mechanism 
that allows to construct robust programs over-approximating their TSO 
semantics. This enables the use of proof methods designed for the SC 
semantics in proving invariants that hold on the TSO semantics of a 
non-robust program. These techniques have been evaluated on a large 
set of benchmarks using the infrastructure provided by CIVL, a generic 
tool for reasoning about concurrent programs under the SC semantics. 


1 Introduction 


A classical memory model for shared-memory concurrency is Sequential Con- 
sistency (SC) [16], where the actions of different threads are interleaved while 
the program order between actions of each thread is preserved. For performance 
reasons, modern multiprocessors implement weaker memory models, e.g., Total 
Store Ordering (TSO) [19] in x86 machines, which relax the program order. For 
instance, the main feature of TSO is the write-to-read relaxation, which allows 
reads to overtake writes. This relaxation reflects the fact that writes are buffered 
before being flushed non-deterministically to the main memory. 

Nevertheless, most programmers usually assume that memory accesses hap- 
pen instantaneously and atomically like in the SC memory model. This assump- 
tion is safe for data-race free programs [3]. However, many programs employing 
lock-free synchronization are not data-race free, e.g., programs implementing 
synchronization operations and libraries implementing concurrent objects. In 
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most cases, these programs are designed to be robust against relaxations, i.e., 
they admit the same behaviors as if they were run under SC. Memory fences 
must be included appropriately in programs in order to prevent non-SC behav- 
iors. Getting such programs right is a notoriously difficult and error-prone task. 
Robustness can also be used as a proof method, that allows to reuse the existing 
SC verification technology. Invariants of a robust program running under SC are 
also valid for the TSO executions. Therefore, the problem of checking robustness 
of a program against relaxations of a memory model is important. 

In this paper, we address the problem of checking robustness in the case 
of TSO. We present a methodology for proving robustness which uses the con- 
cepts of left/right mover in Lipton’s reduction theory [17]. Intuitively, a program 
statement is a left (resp., right) mover if it commutes to the left (resp., right) 
with respect to the statements in the other threads. These concepts have been 
used by Lipton [17] to define a program rewriting technique which enlarges the 
atomic blocks in a given program while preserving the same set of behaviors. In 
essence, robustness can also be seen as an atomicity problem: every write state- 
ment corresponds to two events, inserting the write into the buffer and flushing 
the write from the buffer to the main memory, which must be proved to happen 
atomically, one after the other. However, differently from Lipton’s reduction the- 
ory, the events that must be proved atomic do not correspond syntactically to 
different statements in the program. This leads to different uses of these concepts 
which cannot be seen as a direct instantiation of this theory. 

In case programs are not robust, or they cannot be proven robust using our 
method, we define a program abstraction technique that roughly, makes reads 
non-deterministic (this follows the idea of combining reduction and abstraction 
introduced in [12]). The non-determinism added by this abstraction can lead to 
programs which can be proven robust using our method. Then, any invariant 
(safety property) of the abstraction, which is valid under the SC semantics, is 
also valid for the TSO semantics of the original program. As shown in our exper- 
iments, this abstraction leads in some cases to programs which reach exactly the 
same set of configurations as the original program (but these configurations can 
be reached in different orders), which implies no loss of precision. 

We tested the applicability of the proposed reduction and abstraction based 
techniques on an exhaustive benchmark suite containing 34 challenging programs 
(from [2,7]). These techniques were precise enough for proving robustness of 32 
of these programs. One program (presented in Fig. 3) is not robust, and required 
abstraction in order to derive a robust over-approximation. There is only one 
program which cannot be proved robust using our techniques (although it is 
robust). We believe however that an extension of our abstraction mechanism to 
atomic read-write instructions will be able to deal with this case. We leave this 
question for future work. 

An extended version of this paper with missing proofs can be found at [8]. 


2 Overview 


The TSO memory model allows strictly more behaviors than the clas- 
sic SC memory model: writes are first stored in a thread-local buffer and 
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init it rd, x,0 
procedure recv(){ 
procedure send (){ do{ l | 
y= rl; rl := x; tywry,42 aati 
x = Í; }while(rl == 0); =e 
} r2 i= yi t,rd,x,1 
} t,,wr,x,1 
itz, rd, y,42 


Fig. 1. An example message passing program and a sample trace. Edges of the trace 
shows the happens before order of global accesses and they are simplified by applying 
transitive reduction. 


non-deterministically flushed into the shared memory at a later time (also, the 
write buffers are accessed first when reading a shared variable). However, in prac- 
tice, many programs are robust, i.e., they have exactly the same behaviors under 
TSO and SC. Robustness implies for instance, that any invariant proved under 
the SC semantics is also an invariant under the TSO semantics. We describe 
in the following a sound methodology for checking that a program is robust, 
which avoids modeling and verifying TSO behaviors. Moreover, for non-robust 
programs, we show an abstraction mechanism that allows to obtain robust pro- 
grams over-approximating the behaviors of the original program. 

As a first example, consider the simple “message passing” program in Fig. 1. 
The send method sets the value of the “communication” variable y to some 
predefined value from register r1. Then, it raises a flag by setting the variable 
x to 1. Another thread executes the method recv which waits until the flag is 
set and then, it reads y (and stores the value to register r2). This program is 
robust, TSO doesn’t enable new behaviors although the writes may be delayed. 
For instance, consider the following TSO execution (we assume that r1 = 42): 


(t1, isu) (t1, isu) (t1, com, y, 42) (t1, com, x, 1) 
(ta, rd, x, 0) (t2, rd, x, 0) (t2, rd, x, 1)(t2, rd, y, 42) 


The actions of each thread (tı or t2) are aligned horizontally, they are either issue 
actions (isu) for writes being inserted into the local buffer (e.g., the first (t1, isu) 
represents the write of y being inserted to the buffer), commit actions (com) for 
writes being flushed to the main memory (e.g., (t1, com, y,42) represents the 
write y := 42 being flushed and executed on the shared memory), and read 
actions for reading values of shared variables. Every assignment generates two 
actions, an issue and a commit. The issue action is “local”, it doesn’t enable or 
disable actions of other threads. 

The above execution can be “mimicked” by an SC execution. If we had not 
performed the isu actions of tı that early but delayed them until just before 
their corresponding com actions, we would obtain a valid SC execution of the 
same program with no need to use store buffers: 
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(t1, wr, y, 42) (tı, wr, a, 1) 
(t2, rd, x, 0) (ta, rd, x, 0) (t2, rd, x, 1)(t2, rd, Y, 42) 


Above, consecutive isu and com actions are combined into a single write action 
(wr). This intuition corresponds to an equivalence relation between TSO exe- 
cutions and SC executions: if both executions contain the same actions on the 
shared variables (performing the same accesses on the same variables with the 
same values) and the order of actions on the same variable are the same for 
both executions, we say that these executions have the same trace [20], or that 
they are trace-equivalent. For instance, both the SC and TSO executions given 
above have the same trace given in Fig. 1. The notion of trace is used to formal- 
ize robustness for programs running under TSO [7]: a program is called robust 
when every TSO execution has the same trace as an SC execution. 

Our method for showing robustness is based on proving that every TSO exe- 
cution can be permuted to a trace-equivalent SC execution (where issue actions 
are immediately followed by the corresponding commit actions). We say that an 
action œ moves right until another action ĝ in an execution if we can swap @ 
with every later action until @ while preserving the feasibility of the execution 
(e.g., not invalidating reads and keeping the actions enabled). We observe that 
if a moves right until Ø then the execution obtained by moving a just before 
B has the same trace with the initial execution. We also have the dual notion 
of moves-left with a similar property. As a corollary, if every issue action moves 
right until the corresponding commit action or every commit action moves left 
until the corresponding issue action, we can find an equivalent SC execution. For 
our execution above, the issue actions of the first thread move right until their 
corresponding com actions. Note that there is a commit action which doesn’t 
move left: moving (t1, com, x,1) to the left of (t2, rd, x,0) is not possible since it 
would disable this read. 

In general, issue actions and other thread local actions (e.g. statements using 
local registers only) move right of other threads’ actions. Moreover, issue actions 
(t, isu) move right of commit actions of the same thread that correspond to writes 
issued before (t,isu). For the message passing program, the issue actions move 
right until their corresponding commits in all TSO executions since commits 
cannot be delayed beyond actions of the same thread (for instance reads). Hence, 
we can safely deduce that the message passing program is robust. However, this 
reasoning may fail when an assignment is followed by a read of a shared variable 
in the same thread. 


Consider the “store-buffering” 
procedure foo(){ procedúre barti 8 


oi aay S like program in Fig.2. This pro- 
ri i= 23 yo l e progra: 8 S pro 
fence cae gram is also robust. However, the 
r2 i= y; } as: issue action generated by x := 1 


} might not always move right until 


the corresponding commit. Consider 
the following execution (we assume 
that initially, z = 5): 


Fig. 2. An example store buffering program. 
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(tı, isu) (ti, rd, z,5) (t1,com,x,1)... 
(t2, isu) (t2, com, y, 1)(t2, T)(t2, rd, x, 0) 


Here, we assumed that tı executes foo and tz executes bar. The fence instruc- 
tion generates an action T. The first issue action of tı cannot be moved to the 
right until the corresponding commit action since this would violate the program 
order. Moreover, the corresponding commit action does not move left due to the 
read action of t2 on x (which would become infeasible). 

The key point here is that a later read action by the same thread, (t1, rd, z, 5), 
doesn’t allow to move the issue action to the right (until the commit). However, 
this read action moves to the right of other threads actions. So, we can construct 
an equivalent SC execution by first moving the read action right after the commit 
(tı, com, x,1) and then move the issue action right until the commit action. 

In general, we say that an issue (t, isu) of a thread t moves right until the 
corresponding commit if each read action of t after (t, isu) can move right until 
the next action of t that follows both the read and the commit. Actually, this 
property is not required for all such reads. The read actions that follow a fence 
cannot happen between the issue and the corresponding commit actions. For 
instance, the last read action of foo cannot happen between the first issue of 
foo and its corresponding commit action. Such reads that follow a fence are not 
required to move right. In addition, we can omit the right-moves check for read 
actions that read from the thread local buffer (see Sect. 3 for more details). 

In brief, our method for checking robustness does the following for every write 
instruction (assignment to a shared variable): either the commit action of this 
write moves left or the actions of later read instructions that come before a fence 
move right in all executions. This semantic condition can be checked using the 
concept of movers [17] as follows: every write instruction is either a left-mover 
or all the read instructions that come before a fence and can be executed later 
than the write (in an SC execution) are right-movers. Note that this requires no 
modeling and verification of TSO executions. 

For non-robust programs that might reach different configurations under 
TSO than under SC, we define an abstraction mechanism that replaces read 
instructions with “non-deterministic” reads that can read more values than the 
original instructions. The abstracted program has more behaviors than the orig- 
inal one (under both SC and TSO), but it may turn to be robust. When it is 
robust, we get that any property of its SC semantics holds also for the TSO 
semantics of the original program. 

Consider the work stealing queue implementation in Fig.3. A queue is rep- 
resented with an array items. Its head and tail indices are stored in the shared 
variables H and T, respectively. There are three procedures that can operate on 
this queue: any number of threads may execute the steal method and remove 
an element from the head of the queue, and a single unique thread may execute 
put or take methods nondeterministically. The put method inserts an element 
at the tail index and the take method removes an element from the tail index. 
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procedure put(var elt){ 
local t; 
t ¿= Ti 
items[t] := elt; 
T := ttl 
} 


procedure take(){ 
local h,t,res; 


var H,T,items; 


procedure steal(){ 
local h,t,res; 


Li:h := H; aes = on 
ao g h := H; | //havoc(h, h < H); 
> if( t <h ){ 
return —1; T := h: 
res := items [h]; return =i; 
if( cas(H,h,h+1) ) } 
ao res; res := items[t]; 
oto Ll; sa erate: 
8 return res; 
} T := h+1; 


if( cas(H,h,h+1) ) 
return task; 
else 
goto Ll1; 


Fig. 3. Work stealing queue. 


This program is not robust. If there is a single element in the queue and the 
take method takes it by delaying its writes after some concurrent steals, one 
of the concurrent steals might also remove this last element. Popping the same 
element twice is not possible under SC, but it is possible under TSO semantics. 
However, we can still prove some properties of this program under TSO. Our 
robustness check fails on this program because the writes of the worker thread 
(executing the put and take methods) are not left movers and the read from 
the variable H in the take method is not a right mover. This read is not a right 
mover w.r.t. successful CAS actions of the steal procedure that increment H. 

We apply an abstraction on the instruction of the take method that reads 
from H such that instead of reading the exact value of H, it can read any value 
less than or equal to the value of H. We write this instruction as havoc(h, h < H) 
(it assigns to h a nondeterministic value satisfying the constraint h < H). Note 
that this abstraction is sound in the sense that it reaches more states under 
SC/TSO than the original program. 

The resulting program is robust. The statement havoc(h,h < H) is a right 
mover w.r.t. successful CAS actions of the stealer threads. Hence, for all the 
write instructions, the reachable read instructions become right movers and our 
check succeeds. The abstract program satisfies the specification of an idempotent 
work stealing queue (elements can be dequeued multiple times) which implies 
that the original program satisfies this specification as well. 


3 TSO Robustness 


We present the syntax and the semantics of a simple programming language 
used to state our results. We define both the TSO and the SC semantics, an 
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(prog) ::= program (pid) vars (var)* (thread)* 


(thread) ::= thread (tid) regs (reg)* init (label) begin (linst)* end 


(linst) ::= (label): (inst); goto (label); 


(inst) := (var) := (expr) 
| (reg) := (expr) 
| (reg) := (var) 
| fence 
| (reg) := cas((var), (expr), (expr)) 
| skip 


assume (bezpr) 


Fig. 4. Syntax of the programs. The star (*) indicates zero or more occurrences of 
the preceding element. (pid), (tid), (var), (reg) and (label) are elements of their given 
domains representing the program identifiers, thread identifiers, shared variables, regis- 
ters and instruction labels, respectively. (expr) is an arithmetic expression over (reg)*. 
Similarly, (bexpr) is a boolean expression over (reqg)*. 


abstraction of executions called trace [20] that intuitively, captures the happens- 
before relation between actions in an execution, and the notion of robustness. 


Syntax. We consider a simple programming language which is defined in Fig. 4. 
Each program P has a finite number of shared variables 7 and a finite num- 
ber of threads ( t). Also, each thread t; has a finite set of local registers (77) 
and a start label 12. Bodies of the threads are defined as finite sequences of 
labelled instructions. Each instruction is followed by a goto statement which 
defines the evolution of the program counter. Note that multiple instructions 
can be assigned to the same label which allows us to write non-deterministic 
programs and multiple goto statements can direct the control to the same label 
which allows us to mimic imperative constructs like loops and conditionals. An 
assignment to a shared variable (var) := (expr) is called a write instruction. 
Also, an instruction of the form (reg) := (var) is called a read instruction. 

Instructions can read from or write to shared variables or registers. Each 
instruction accesses at most one shared variable. We assume that the program 
P comes with a domain D of values that are stored in variables and registers, 
and a set of functions F used to calculate arithmetic and boolean expressions. 

The fence statement empties the buffer of the executing thread. The cas 
(compare-and-swap) instruction checks whether the value of its input variable 
is equal to its second argument. If so, it writes sets third argument as the value 
of the variable and returns true. Otherwise, it returns false. In either case, cas 
empties the buffer immediately after it executes. The assume statement allows 
us to check conditions. If the boolean expression it contains holds at that state, 
it behaves like a skip. Otherwise, the execution blocks. Formal description of 
the instructions are given in Fig. 5. 
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x := ae(Tt) € ins(pe(t)) v= eval(ae(T})) cee 


(pe, mem, buf) Eaa ad, (pc', mem, buf [t 4 bu f (t) o ((x, v)))] 


buf (t) = ((a,v)) obuf’ re? 


(t,com,a,v) 


(pe, mem, buf) Tso (pc, mem, buf [t + buf’] 


r := ae(T?) € ins(pc(t)) v = eval(ae(R)) rer 


(pc, mem, buf) SOO rgo (pc', mem|r > v], buf) 


r:= a E ins(pe(t)) xE X v=mem(x) xé varsOfBuf(buf(t)) re re 


(t,rd,x,v) 
ee 


(pe, mem, buf) rso (pc, memj|r > v], buf) 


r:= x € ins(pe(t)) xE buf =ao((z,v))o8 «x ¢varsOfBuf(B) rer 


(pc, mem, buf) BOLEA (pc', mem[r > v], buf) 
fence € ins(pc(t)) buf(t) =e 


(pe, mem, buf) AUN ago (pc', mem, buf) 


r := cas(x,ae1(7), ae2(T})) € ins(pe(t)) mem(x) = eval(aer(7?)) buf(t) =e v= eval(ae2(7})) 


(t,isu)(t,com,a,v) 


(pe, mem, buf) rso (pce’,mem|[r + 1][x —> v], buf) 


r := cas(x,ae1(7f), ae2(7t)) € ins(pe(t)) mem(x) ¥ eval(aei(7T?)) buf(t)=e v= mem/(zx) 


(pe, mem, buf) Baaada ea (pe', mem[r — 0], buf) 
assume be(r?) € ins(pc(t)) eval(be(7?)) =T 


(pe, mem, buf) $ rso (pc', mem, buf) 


Fig. 5. The TSO transition relation. The function ins takes a label | and returns the set 
of instructions labelled by l. We always assume that x € X, r € T} and pe’ = pelt — l'] 
where pc(t) : inst goto l’; is a labelled instruction of t and inst is the instruction 
described at the beginning of the rule. The evaluation function eval calculates the 
value of an arithmetic or boolean expression based on mem (ae stands for arithmetic 
expression). Sequence concatenation is denoted by o. The function varsOf Buf takes 
a sequence of pairs and returns the set consisting of the first fields of these pairs. 


TSO Semantics. Under the TSO memory model, each thread maintains a local 
queue to buffer write instructions. A state s of the program is a triple of the 
form (pc,mem, buf). Let £ be the set of available labels in the program P. 
Then, pc : F — L shows the next instruction to be executed for each thread, 
mem : | Lee TUG 3D represents the current values in shared variables and 
registers and buf : TS (@ x D)* represents the contents of the buffers. 
There is a special initial state so = (pco, memo, bu fo). At the beginning, each 
thread t; points to its initial label 1° i.e., pco(t;) = 19. We assume that there is a 
special default value 0 € D. All the shared variables and registers are initiated as 
0 i.e., memo(x) = 0 for all a € User T} UT. Lastly, all the buffers are initially 


empty i.e., bufo(ti) = € for all t; € F. 
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The transition relation —>rso between program states is defined in Fig. 5. 
Transitions are labelled by actions. Each action is an element from t x ({7,isu}U 
({com,rd} x X x D)). Actions keep the information about the thread performing 
the transition and the actual parameters of the reads and the writes to shared 
variables. We are only interested in accesses to shared variables, therefore, other 
transitions are labelled with 7 as thread local actions. 


A TSO execution of a program P is a sequence of actions 7 = T1, T2, ..., Tn 
such that there exists a sequence of states o = 09,01,.--,0n, Co = So is the 
initial state of P and oj_1 —> øg; is a valid transition for any i € {1,...,n}. We 


assume that buffers are empty at the end of the execution. 


SC Semantics. Under SC, a program state is a pair of the form (pc,mem) 
where pc and mem are defined as above. Shared variables are read directly 
from the memory mem and every write updates directly the memory mem. 
To make the relationship between SC and TSO executions more obvious, every 
write instruction generates isu and com actions which follow one another in the 
execution (each isu is immediately followed by the corresponding com). Since 
there are no write buffers, fence instructions have no effect under SC. 


Traces and TSO Robustness. Consider a (TSO or SC) execution a of P. The 
trace of 7 is a graph, denoted by Tr(a): Nodes of T(z) are actions of 7 except 
the 7 actions. In addition, isu and com actions are unified in a single node. 
The isu action that puts an element into the buffer and the corresponding com 
action that drains that element from the buffer correspond to the same node 
in the trace. Edges of Tr(a) represent the happens before order (hb) between 
these actions. The hb is union of four relations. The program order po keeps the 
order of actions performed by the same thread excluding the com actions. The 
store order so keeps the order of com actions on the same variable that write 
different values!. The read-from relation, denoted by rf, relates a com action to 
a rd action that reads its value. Lastly, the from-reads relation fr relates a rd 
action to a com action that overwrites the value read by rd; it is defined as the 
composition of rf and so. 

We say that the program P is TSO robust if for any TSO execution z of P, 
there exists an SC execution 7’ such that Tr(z) = Tr(z’). It has been proven 
that robustness implies that the program reaches the same valuations of the 
shared memory under both TSO and SC [7]. 


4 A Reduction Theory for Checking Robustness 


We present a methodology for checking robustness which builds on concepts 
introduced in Lipton’s reduction theory [17]. This theory allows to rewrite a 


1 Our definition of store order deviates slightly from the standard definition which 
relates any two writes writing on the same variable, independently of values. The 
notion of TSO trace robustness induced by this change is slightly weaker than the 
original definition, but still implies preservation of any safety property from the SC 
semantics to the TSO semantics. The results concerning TSO robustness used in 
this paper (Lemma 1) are also not affected by this change. See [8] for more details. 
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given concurrent program (running under SC) into an equivalent one that has 
larger atomic blocks. Proving robustness is similar in spirit in the sense that 
one has to prove that issue and commit actions can happen together atomically. 
However, differently from the original theory, these actions do not correspond 
to different statements in the program (they are generated by the same write 
instruction). Nevertheless, we show that the concepts of left/right movers can 
be also used to prove robustness. 


Movers. Let 7 = T1,..., Tn bean SC execution. We say that the action 7; moves 
right (resp., left) in m if the sequence 71,...,i—1, Ti+1; Ti, Tit2,---;7n (resp., 
Ti,- <, Ti—2, Ti, Wi-1, Ti+1 +++} Tn ) is also a valid execution of P, the thread of r; 
is different than the thread of 7;,1 (resp., 7:1), and both executions reach to 
the same end state on. Since every issue action is followed immediately by the 
corresponding commit action, an issue action moves right, resp., left, when the 
commit action also moves right, resp., left, and vice-versa. 

Let instOf, be a function, depending on an execution 7, which given an 
action 7; E€ 7, gives the labelled instruction that generated 7;. Then, a labelled 
instruction £ is a right (resp., left) mover if for all SC executions a of P and for 
all actions 7; of m such that instOf(z;) = 4, 7; moves right (resp., left) in m. 

A labelled instruction is a non-mover if it is neither left nor right mover, and 
it is a both mover if it is both left and right mover. 


Reachability Between Instructions. An instruction l is reachable from the 
instruction £ if @ and / both belong to the same thread and there exists an 
SC execution m and indices 1 < i < j < |r| such that instOf,(m;) = @ and 
instOf,,(7;) = V. We say that l is reachable from £ before a fence if Tẹ is not an 
action generated by a fence instruction in the same thread as 4, for alli < k < j. 
When Z is a write instruction and ¢’ a read instruction, we say that ¢’ is buffer- 
free reachable from £ if mk is not an action generated by a fence instruction in 
the same thread as £ or a write action on the same variable that ¢’ reads-from, 
for alli < k <j. 


Definition 1. We say that a write instruction bwu is atomic if it is a left mover 
or every read instruction L, buffer-free reachable from bu is a right mover. We 
say that P is write atomic if every write instruction lw in P is atomic. 


Note that all of the notions used to define write atomicity (movers and 
instruction reachability) are based on SC executions of the programs. The fol- 
lowing result shows that write atomicity implies robustness. 


Theorem 1 (Soundness). If P is write atomic, then it is robust. 


We will prove the contrapositive of the statement. For the proof, we need 
the notion of minimal violation defined in [7]. A minimal violation is a TSO 
execution in which the sum of the number of same thread actions between isu 
and corresponding com actions for all writes is minimal. A minimal violation 
is of the form 7 = 7, (t, isu), m2, (t, rd, y, *), 73, (t, com, x, *), m4 such that mı 
is an SC execution, only t can delay com actions, the first delayed action is 
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the (t,com,z,*) action after 73 and it corresponds to (t, isu) after 7, 72 does 
not contain any com or fence actions by t (writes of t are delayed until after 
(t,rd,y,*)), (t,rd,y,*) —>nb+ act for all act € 73 0 {(t, com, x, *)} (isu and com 
actions of other threads are counted as one action for this case), 73 doesn’t 
contain any action of t, 74 contains only and all of the com actions of t that are 
delayed in (t, isu) o 72 and no com action in (t, com, x, *) o m4 touches y. 
Minimal violations are important for us because of the following property: 


Lemma 1 (Completeness of Minimal Violations [7]). The program P is 
robust iff it does not have a minimal violation. 


Before going into the proof of Theorem 1, we define some notations. Let 7 
be a sequence representing an execution or a fragment of it. Let Q be a set of 
thread identifiers. Then, 7|g is the projection of m on actions from the threads 
in Q. Similarly, m|» is the projection of m on first n elements for some number 
n. 8z(m) gives the length of the sequence 7. We also define a product operator 
®. Let m and p be some execution fragments. Then, 7 ® p is same as 7 except 
that if the it” isu action of m is not immediately followed by a com action by 
the same thread, then i*” com action of p is inserted after this isu. The product 
operator helps us to fill unfinished writes in one execution fragment by inserting 
commit actions from another fragment immediately after the issue actions. 


Proof (Theorem1). Assume P is not robust. Then, there exists a minimal vio- 
lation 7 = 71, Q, 72, 9,73, B, 74 satisfying the conditions described before, where 
a = (t,isu), 0 = (t,rd,y,*) and B = (t,com,x,*). Below, we show that the 
write instruction w = instOf (a) is not atomic. 


1. w is not a left mover. 

L1. p= mi, male coy MIT cepleelmsleveey 
where y is the last action of 73. y is a read or write action on x performed 
by a thread ¢’ other than t and value of y is different from what is written 
by 2. 

1.1.1. pis an SC execution because t never changes value of a shared variable 
in 72 and 73. So, even we remove actions of t in those parts, actions 
of other threads are still enabled. Since other threads perform only 
SC operations in 7, Ti, Tal Pca} 731 \ 14} is an SC execution. From 
am, we also know that the first enabled action of t is a if we delay the 
actions of t in 72 and 73. 

1.1.2. The last action of 73 is y. By definition of a minimal violation, we 
know that 0 —,,+ @ and 73 does not contain any action of t. So, 
there must exist an action y € 73 such that either y reads from x 
and y >şfr E in m or y writes to x and y >. in m. Moreover, y 
is the last action of 73 because if there are other actions after y, we 
can delete them and can obtain another minimal violation which is 
shorter than 7 and hence contradict the minimality of 7. 

1.2. p = rp TP cy T3lP\ cy lee(male g) (a, 8), y is an SC execution with 
a different end state than p defined in 1.1 has or it is not an SC execution, 
where instOf(+’) = instOf (y). 


)-1,7; (a, 8) is an SC execution of P 
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1.2.1. In the last state of p, x has the value written by 8. If y is a write 
action on x, then x has a different value at the end of p’ due to the 
definition of a minimal violation. If y is a read action on x, then it 
does not read the value written by 8 in p. However, y reads this value 
in p’. Hence, p’ is not a valid SC execution. 

2. There exists a read instruction r buffer-free reachable from w such that r is 
not a right mover. We will consider two cases: Either there exists a rd action 
of t on variable z in mə such that there is a later write action by another 
thread t on z in m2 that writes a different value or not. Moreover, z is not a 
variable that is touched by the delayed commits in 74 i.e., it does not read 
its value from the buffer. 

2.1. We first evaluate the negation of above condition. Assume that for all 
actions y and y’ such that y occurs before y’ in mo, either y 4 (t, rd, z, vz) 
or y & (t’,isu)(t’, com, z,v,). Then, r = instOf(@) is not a right mover 
and it is buffer-free reachable from w. 

2.1.1. p= Ti, Tal Py cay Tle} @ 74,6,0’ is a valid SC execution of P where 


’ = (t’,isu)(t’, com, y, *) for some t 4 t’. 


2.1.1.1. 


2.1.1.2. 


2.1.1.3. 


p is an SC execution. 7, Tal yy (ty is a valid SC execution since t 
does not update value of a shared variable in 72. Moreover, all of 
the actions of t become enabled after this sequence since t never 
reads value of a variable updated by another thread in 72. Lastly, 
the first action of 73 is enabled after this sequence. 

The first action of 73 is 6’ = (t’,isu)(t’, com, y,*). Let 0’ be the 
first action of 73. Since 6 py 6’ in m and @’ is not an action 
of t by definition of minimal violation, the only case we have is 
0 =fr 6’. Hence, 6’ is a write action on y that writes a different 
value than 0 reads. 

r is buffer-free reachable from w. p is a SC execution, first action 
of p after Ti, TPA sey is a, 8; w = instOf((a, 8)), r = instOf (0) 
and actions of t in p between q, 68 and 0 are not instances of a 
fence instruction or write to y. 


2.1.2. p= Ti, Tal Fy cay Tolle} Q 74,6’, @ is not a valid SC execution. 


2.1.2.1. 


In the last state of p, the value of y seen by t is the value read 
in 6. It is different than the value written by 6’. However, at the 
last state of p’, the value of y t sees must be the value 6’ writes. 
Hence, p’ is not a valid SC execution. 


2.2. Assume that there exists y = (t,rd,z,vz) and 7’ = (t, isu)(t’, com, z, v, ) 
in m2. Then, r = instOf(7) is not a right mover and r is buffer-free reach- 
able from w. 

2.2.1. Let i be the index of y and j be the index of y’ in 72. Then, define 
p= Ti, 72|j—1 FP gep Talal te} @ 74,7’. p is an SC execution of P. 


2.2.1.1. 


p is an SC execution. 71, 215-113) te} prefix is a valid SC execu- 
tion because t does not update any shared variable in 72. More- 
over, all of the actions of t in 72|;|44; ® 74 become enabled after 
this sequence since t never reads a value of a variable updated by 
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another thread in 72 and 7’ is the next enabled in m after this 
sequence since it is a write action. 

2.2.2. Let i and j be indices of y and y’ in m2 respectively. Define p' = 
Ti, M21 j—al FP cay M2le—al {4} Q T4, Y', y. Then, p’ is not a valid SC exe- 
cution. 

2.2.2.1. In the last state of p, value of z seen by t is vz. It is different than 
the v}, value written by 7’. However, in the last state of p’, the 
value of z t sees must be v!,. Hence, p’ is not a valid SC execution. 

2.2.3. r is buffer-free reachable from w because p defined in 2.2.1 is 
an SC execution, first action after Ti, Tolj- lPi is œa, ĝ, w = 
instOf((a, 3)), r = instOf (y) and actions of t in p between a, 8 and 0 
are not instances of a fence instruction or a write to z by t. 


5 Abstractions and Verifying Non-robust Programs 


In this section, we introduce program abstractions which are useful for verifying 
non-robust TSO programs (or even robust programs — see an example at the end 
of this section). In general, a program P’ abstracts another program P for some 
semantic model M € {SC, TSO} if every shared variable valuation ø reachable 
from the initial state in an M execution of P is also reachable in an M execution 
of P’. We denote this abstraction relation as P <m P’. 

In particular, we are interested in read instruction abstractions, which replace 
instructions that read from a shared variable with more “liberal” read instruc- 
tions that can read more values (this way, the program may reach more shared 
variable valuations). We extend the program syntax in Sect. 3 with havoc instruc- 
tions of the form havoc((reg), (varbexpr) ), where (varbexpr) is a boolean expres- 
sion over a set of registers and a single shared variable (var). The meaning 
of this instruction is that the register reg is assigned with any value that 
satisfies varbexpr (where the other registers and the variable var are inter- 
preted with their current values). The program abstraction we consider will 
replace read instructions of the form (reg) := (var) with havoc instructions 
havoc((reg), (varbexpr)). 

While replacing read instructions with havoc instructions, we must guarantee 
that the new program reaches at least the same set of shared variable valuations 
after executing the havoc as the original program after the read. Hence, we 
allow such a rewriting only when the boolean expression varbexpr is weaker (in 
a logical sense) than the equality reg = var (hence, there exists an execution of 
the havoc instruction where reg = var). 


Lemma 2. Let P be a program and P’ be obtained from P by replacing an 
instruction lı : x := r; goto lz of a thread t with lı : havoc(r, (xz, T )); goto l2 
such that Yz,r. z =r => $(x,7) is valid. Then, P Xsc P! and P Xrso P’. 


The notion of trace extends to programs that contain havoc instructions as 
follows. Assume that (t, huc, x, é(x)) is the action generated by an instruction 
havoc(r, (a, 7 )), where x is a shared variable and 7 a set of registers (the 
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procedure bar(){ 


procedure foo(){ ae ERE 
OA //havoc(ri,(x #0)?rl=xVrl=0:rl=0) 
} f : }while(rl == 0); 
yo tS AL 


} 


Fig. 6. An example program that needs a read abstraction to pass our robustness 
checks. The havoc statement in comments reads as follows: if value of x is not 0 then 
rl gets either the value of x or 0. Otherwise, it is 0. 


action stores the constraint @ where the values of the registers are instantiated 
with their current values — the shared variable x is the only free variable in ¢(2)). 
Roughly, the huc actions are special cases of rd actions. Consider an execution 
m where an action a = (t,huc,x,¢(x)) is generated by reading the value of a 
write action 3 = (com, x, v) (i.e., the value v was the current value of x when the 
havoc instruction was executed). Then, the trace of 7 contains a read-from edge 
Ê —,rs a as for regular read actions. However, fr edges are created differently. If 
a was a rd action we would say that we have a —f, y if 8 >rf a and 3 >st Y. 
For the havoc case, the situation is a little bit different. Let y = (com, x, v’) be 
an action. We have a — +, y if and only if either 6 >, , a, B st y and ¢(v’) 
is false or a >, y and y’ —>st y where 7’ is an action. Intuitively, there is a 
from-read dependency from an havoc action to a commit action, only when the 
commit action invalidates the constraint (x) of the havoc (or if it follows such 
a commit in store order). 

The notion of write-atomicity (Definition 1) extends to programs with havoc 
instructions by interpreting havoc instructions havoc(r, ¢(z, F )) as regular read 
instructions r := x. Theorem 1 which states that write-atomicity implies robust- 
ness can also be easily extended to this case. 

Read abstractions are useful in two ways. First, they allow us to prove prop- 
erties of non-robust program as the work stealing queue example in Fig. 3. We 
can apply appropriate read abstractions to relax the original program so that it 
becomes robust in the end. Then, we can use SC reasoning tools on the robust 
program to prove invariants of the program. 

Second, read abstractions could be helpful for proving robustness directly. 
The method based on write-atomicity we propose for verifying robustness is 
sound but not complete. Some incompleteness scenarios can be avoided using 
read abstractions. If we can abstract read instructions such that the new program 
reaches exactly the same states (in terms of shared variables) as the original one, 
it may help to avoid executions that violate mover checks. 

Consider the program in Fig.6. The write statement x := 1 in procedure 
foo is not atomic. It is not a left mover due to the read of x in the do-while loop 
of bar. Moreover, the later read from y is buffer-free reachable from this write 
and it is not a right mover because of the write to y in bar. To make it atomic, we 
apply read abstraction to the read instruction of bar that reads from x. In the 
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new relaxed read, r1 can read 0 along with the value of x when z is not zero as 
shown in the comments below the instruction. With this abstraction, the write 
to x becomes a left mover because reads from x after the write can now read the 
old value which was 0. Thus, the program becomes write-atomic. If we think of 
TSO traces of the abstract program and replace hve nodes with rd nodes, we 
get exactly the TSO traces of the original program. However, the abstraction 
adds more SC traces to the program and the program becomes robust. 


6 Experimental Evaluation 


To test the practical value of our method, we have considered the benchmark 
for checking TSO robustness described in [2], which consists of 34 programs. 
This benchmark is quite exhaustive, it includes examples introduced in previous 
works on this subject. Many of the programs in this benchmark are easy to prove 
being write-atomic. Every write is followed by no buffer-free read instruction 
which makes them trivially atomic (like the message passing program in Fig. 1). 
This holds for 20 out of the 34 programs. Out of the remaining programs, 13 
required mover checks and 4 required read abstractions to show robustness (our 
method didn’t succeed on one of the programs in the benchmark, explained at 
the end of this section). Except Chase-Lev, the initial versions of all the 12 
examples are trace robust”. Besides Chase-Lev, read-abstractions are equivalent 
to the original programs in terms of reachable shared variable configurations. 
Detailed information for these examples can be found in Table 1. 

To check whether writes/reads are left/right movers and the soundness of 
abstractions, we have used the tool Civu [13]. This tool allows to prove asser- 
tions about concurrent programs (Owicki-Gries annotations) and also to check 
whether an instruction is a left/right mover. The buffer-free read instructions 
reachable from a write before a fence were obtained using a trivial analysis of the 
control-flow graph (CFG) of the program. This method is a sound approximation 
of the definition in Sect. 4 but it was sufficient for all the examples. 

Our method was not precise enough to prove robustness for only one example, 
named as nbw-w-1r-r1 in [7]. This program contains a method with explicit calls 
to the lock and unlock methods of a spinlock. The instruction that writes to 
the lock variable inside the unlock method is not atomic, because of the reads 
from the lock variable and the calls to the getAndSet primitive inside the lock 
method. Abstracting the reads from the lock variable is not sufficient in this 
case due to the conflicts with getAndSet actions. However, we believe that read 
abstractions could be extended to getAndSet instructions (which both read and 
write to a shared variable atomically) in order to deal with this example. 


? If we consider the standard notion of so (that relates any two writes on the same 
variable independent of their values), all examples except MCSLock and dc-locking 
become non trace robust. 
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Table 1. Benchmark results. The second column (RB) stands for the robustness status 
of the original program according to our extended hb definition. RA column shows the 
number of read abstractions performed. RM column represents the number of read 
instructions that are checked to be right movers and the LM column represents the 
write instructions that are shown to be left movers. PO shows the total number of 
proof obligations generated and VT stands for the total verification time in seconds. 


Name RBI RA RM|LM|PO | VT 

Chase-Lev — j1 2 - 149 | 0.332 
FIFO-iWSQ + l- 2 - 124 | 0.323 
LIFO-iWSQ + |- 1 - 109 | 0.305 
Anchor-iWSQ + l- 1 - 109 | 0.309 
MCSLock + |2 2 - 233 | 0.499 
r+detour + |- 1 - 53 | 0.266 
r+detours + |- 1 - 64 | 0.273 
sb+detours+coh |+ |- 2 - 108 | 0.322 
sb+detours + |- 1 1 125 | 0.316 
write+r-+coh + |- 1 - 78 | 0.289 
write+r + l- 1 - 48 |0.261 
dc-locking + L 4 1 52 | 0.284 
inline_pgsql + |2 2 - 90 | 0.286 


7 Related Work 


The weakest correctness criterion that enables SC reasoning for proving invari- 
ants of programs running under TSO is state-robustness i.e., the reachable set 
of states is the same under both SC and TSO. However, this problem has high 
complexity (non-primitive recursive for programs with a finite number of threads 
and a finite data domain [6]). Therefore, it is difficult to come up with an effi- 
cient and precise solution. A symbolic decision procedure is presented in [1] and 
over-approximate analyses are proposed in [14,15]. 

Due to the high complexity of state-robustness, stronger correctness crite- 
ria with lower complexity have been proposed. Trace-robustness (that we call 
simply robustness in our paper) is one of the most studied criteria in the litera- 
ture. Bouajjani et al. [9] have proved that deciding trace-robustness is PSPACE- 
complete, resp., EXPSPACE-complete, for a finite, resp., unbounded, number of 
threads and a finite data domain. 

There are various tools for checking trace-robustness. TRENCHER [7] applies 
to bounded-thread programs with finite data. In theory, the approach in Trencher 
can be applied to infinite-state programs, but implementing it is not obvious 
because it requires solving non-trivial reachability queries in such programs. In 
comparison, our approach (and our implementation based on CIVL) applies to 
infinite state programs. All our examples consider infinite data domains, while 
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Chase-Lev, FIFO-iWSQ, LIFO-iWSQ, Anchor-iWSQ, MCSLock, dc-locking and 
inline_pgsql have an unbounded number of threads. MUSKETEER [4] provides an 
approximate solution by checking existence of critical cycles on the control-flow 
graph. While Musketeer can deal with infinite data (since data is abstracted 
away), it is restricted to bounded-thread programs. Thus, it cannot deal with the 
unbounded thread examples mentioned above. Furthermore, Musketeer cannot 
prove robust even some examples with finitely many threads, e.g., nbw_w_wr, 
write+r, r+detours, sb+detours+coh. Other tools for approximate robustness 
checking, to which we compare in similar ways, have been proposed in [5, 10, 11]. 

Besides trace-robustness, there are other correctness criteria like triangular 
race freedom (TRF) and persistence that are stronger than state-robustness. 
Persistence [2] is incomparable to trace-robustness, and TRF [18] is stronger 
than both trace-robustness and persistence. Our method can verify examples 
that are state-robust but neither persistent nor TRF. 

Reduction and abstraction techniques were used for reasoning on SC pro- 
grams. QED [12] is a tool that supports statement transformations as a way of 
abstracting programs combined with a mover analysis. Also, CIvL [13] allows 
proving location assertions in the context of the Owicki-Gries logic which is 
enhanced with Lipton’s reduction theory [17]. Our work enables the use of such 
tools for reasoning about the TSO semantics of a program. 
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Abstract. A dynamic partial order reduction (DPOR) algorithm 
is optimal when it always explores at most one representative per 
Mazurkiewicz trace. Existing literature suggests that the reduction 
obtained by the non-optimal, state-of-the-art Source-DPOR (SDPOR) 
algorithm is comparable to optimal DPOR. We show the first program 
with O(n) Mazurkiewicz traces where SDPOR explores 0(2”) redundant 
schedules (as this paper was under review, we were made aware of the 
recent publication of another paper [3] which contains an independently- 
discovered example program with the same characteristics). We further- 
more identify the cause of this blow-up as an NP-hard problem. Our 
main contribution is a new approach, called Quasi-Optimal POR, that 
can arbitrarily approximate an optimal exploration using a provided con- 
stant k. We present an implementation of our method in a new tool 
called DPU using specialised data structures. Experiments with DPU, 
including Debian packages, show that optimality is achieved with low 
values of k, outperforming state-of-the-art tools. 


1 Introduction 


Dynamic partial-order reduction (DPOR) [1,10,19] is a mature approach to 
mitigate the state explosion problem in stateless model checking of multi- 
threaded programs. DPORs are based on Mazurkiewicz trace theory [13], a true- 
concurrency semantics where the set of executions of the program is partitioned 
into equivalence classes known as Mazurkiewicz traces (M-traces). In a DPOR, 
this partitioning is defined by an independence relation over concurrent actions 
that is computed dynamically and the method explores executions which are rep- 
resentatives of M-traces. The exploration is sound when it explores all M-traces, 
and it is considered optimal [1] when it explores each M-trace only once. 

Since two independent actions might have to be explored from the same 
state in order to explore all M-traces, a DPOR algorithm uses independence to 
compute a provably-sufficient subset of the enabled transitions to explore for each 
state encountered. Typically this involves the combination of forward reasoning 
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Fig. 1. (a) Programs; (b) partially-ordered executions; 


(persistent sets [11] or source sets [1,4]) with backward reasoning (sleep sets [11]) 
to obtain a more efficient exploration. However, in order to obtain optimality, a 
DPOR needs to compute sequences of transitions (as opposed to sets of enabled 
transitions) that avoid visiting a previously visited M-trace. These sequences are 
stored in a data structure called wakeup trees in [1] and known as alternatives 
in [19]. Computing these sequences thus amounts to deciding whether the DPOR 
needs to visit yet another M-trace (or all have already been seen). 

In this paper, we prove that computing alternatives in an optimal DPOR 
is an NP-complete problem. To the best our knowledge this is the first formal 
complexity result on this important subproblem that optimal and non-optimal 
DPORs need to solve. The program shown in Fig. 1(a) illustrates a practical con- 
sequence of this result: the non-optimal, state-of-the-art SDPOR algorithm [1] 
can explore here O(2”) interleavings but the program has only O(n) M-traces. 

The program contains n := 3 writer threads wo, w1, w2, each writing to a 
different variable. The thread count increments n — 1 times a zero-initialized 
counter c. Thread master reads c into variable i and writes to 7;. 

The statements zo = 7 and x, = 8 are independent because they produce the 
same state regardless of their execution order. Statements i = c and any state- 
ment in the count thread are dependent or interfering: their execution orders 
result in different states. Similarly, x; = 0 interferes with exactly one writer 
thread, depending on the value of 7. 

Using this independence relation, the set of executions of this program can 
be partitioned into six M-traces, corresponding to the six partial orders shown 
in Fig. 1(b). Thus, an optimal DPOR explores six executions (2n-executions for n 
writers). We now show why SDPOR explores O(2") in the general case. Concep- 
tually, SDPOR is a loop that (1) runs the program, (2) identifies two dependent 
statements that can be swapped, and (3) reverses them and re-executes the 
program. It terminates when no more dependent statements can be swapped. 

Consider the interference on the counter variable c between the master and 
the count thread. Their execution order determines which writer thread inter- 
feres with the master statement x; = 0. If c = 1 is executed just before i = c, 
then x; = 0 interferes with w,. However, if i = c is executed before, then x; = 0 
interferes with wo. Since SDPOR does not track relations between dependent 
statements, it will naively try to reverse the race between x; = 0 and all writer 
threads, which results in exploring 0(2”) executions. In this program, exploring 
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only six traces requires understanding the entanglement between both interfer- 
ences as the order in which the first is reversed determines the second. 

As a trade-off solution between solving this NP-complete problem and poten- 
tially explore an exponential number of redundant schedules, we propose a hybrid 
approach called Quasi-Optimal POR (QPOR) which can turn a non-optimal 
DPOR into an optimal one. In particular, we provide a polynomial algorithm 
to compute alternative executions that can arbitrarily approximate the optimal 
solution based on a user specified constant k. The key concept is a new notion 
of k-partial alternative, which can intuitively be seen as a “good enough” alter- 
native: they revert two interfering statements while remembering the resolution 
of the last k — 1 interferences. 

The major differences between QPOR and the DPORs of [1] are that: (1) 
QPOR is based on prime event structures [17], a partial-order semantics that 
has been recently applied to programs [19,21], instead of a sequential view to 
thread interleaving, and (2) it computes k-partial alternatives with an O(n*) 
algorithm while optimal DPOR corresponds to computing oo-partial alternatives 
with an O(2") algorithm. For the program shown in Fig. 1(a), QPOR achieves 
optimality with k = 2 because races are coupled with (at most) another race. 
As expected, the cost of computing k-partial alternatives and the reductions 
obtained by the method increase with higher values of k. 

Finding k-partial alternatives requires decision procedures for traversing the 
causality and conflict relations in event structures. Our main algorithmic contri- 
bution is to represent these relations as a set of trees where events are encoded as 
one or two nodes in two different trees. We show that checking causality /conflict 
between events amounts to an efficient traversal in one of these trees. 

In summary, our main contributions are: 


— Proof that computing alternatives for optimal DPOR is NP-complete 
(Sect. 4). 

— Efficient data structures and algorithms for (1) computing k-partial 
alternatives in polynomial time, and (2) represent and traverse partial 
orders (Sect. 5). 

— Implementation of QPOR in a new tool called DPU and experimental evalu- 
ations against SDPOR in NIDHUGG and the testing tool MAPLE (Sect. 6). 

— Benchmarks with O(n) M-traces where SDPOR explores 0(2") executions 
(Sect. 6). 


Furthermore, in Sect.6 we show that: (1) low values of & often achieve opti- 
mality; (2) even with non-optimal explorations DPU greatly outperforms NID- 
HUGG; (3) DPU copes with production code in Debian packages and achieves 
much higher state space coverage and efficiency than MAPLE. 

Proofs for all our formal results are available in the unabridged version [15]. 


2 Preliminaries 


In this section we provide the formal background used throughout the paper. 
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Concurrent Programs. We consider deterministic concurrent programs composed 
of a fixed number of threads that communicate via shared memory and synchro- 
nize using mutexes (Fig. l(a) can be trivially modified to satisfy this). We also 
assume that local statements can only modify shared memory within a mutex 
block. Therefore, it suffices to only consider races of mutex accesses. 

Formally, a concurrent program is a structure P := (M, L, T, mo, lo}, where 
M is the set of memory states (valuations of program variables, including 
instruction pointers), £ is the set of mutezes, mg is the initial memory state, 
lo is the initial mutexes state and T is the set of thread statements. A thread 
statement t := (i, f} is a pair where i € N is the thread identifier associated with 
the statement and f: M — (M x A) is a partial function that models the trans- 
formation of the memory as well as the effect A := {loc}U({acq, rel} x £) of the 
statement with respect to thread synchronization. Statements of Loc effect model 
local thread code. Statements associated with (acq,x) or (rel, x} model lock 
and unlock operations on a mutex z. Finally, we assume that (1) functions f are 
PTIME-decidable; (2) acq/rel statements do not modify the memory; and (3) 
loc statements modify thread-shared memory only within lock/unlock blocks. 
When (3) is violated, then P has a datarace (undefined behavior in almost all 
languages), and our technique can be used to find such statements, see Sect. 6. 

We use labelled transition systems (LTS) semantics for our programs. We 
associate a program P with the LTS Mp := (S,—,A, so}. The set S := M x 
(£ — {0,1}) are the states of Mp, i.e., pairs of the form (m,v) where m is the 
state of the memory and v indicates when a mutex is locked (1) or unlocked (0). 
The actions in A C N x A are pairs (i,b) where i is the identifier of the thread 
that executes some statement and b is the effect of the statement. We use the 
function p: A — N to retrieve the thread identifier. The transition relation > C 
S x Ax S contains a triple (m,v) {:?), (m’,v’) exactly when there is some 
thread statement (i, f) € T such that f(m) = (m’,b) and either (1) b = loc and 
v’ = v, or (2) b = (acq, x) and v(x) = 0 and v’ = v,,41, or (3) b = (rel, x) and 
v’ = Veo: Notation f,, denotes a function that behaves like f for all inputs 
except for x, where f(x) = y. The initial state is so := (mo, lo). 

Furthermore, if s “, s’ is a transition, the action a is enabled at s. Let enabl(s) 
denote the set of actions enabled at s. A sequence o := a@)...a, E A* isa 
run when there are states s1,...,5, satisfying so 4 s1... 2, Sn. We define 
state(o) := sn. We let runs(Mp) denote the set of all runs and reach(Mp) := 
{state(c) € S: o € runs(Mp)} the set of all reachable states. 


Independence. Dynamic partial-order reduction methods use a notion called 
independence to avoid exploring concurrent interleavings that lead to the same 
state. We recall the standard notion of independence for actions in [11]. Two 
actions a,a’ € A commute at a state s € S iff 


— if a € enabl(s) and s £ s’, then a’ € enabl(s) iff a’ € enabl(s’); and 
— if a,a’ € enabl(s), then there is a state s’ such that s aa J and s 2-4 d. 
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Independence between actions is an under-approximation of commutativity. 
A binary relation © C A x A is an independence on Mp if it is symmetric, 
irreflexive, and every pair (a, a’) in © commutes at every state in reach( Mp). 

In general Mp has multiple independence relations, clearly Ø is always one 
of them. We define relation p C A x A as the smallest irreflexive, symmetric 
relation where (i,b) Op (i,b’) holds if i # 7’ and either b = loc or b = acq x 
and b’ ¢ {acq z,rel x}. By construction ®p is always an independence. 


Labelled Prime Event Structures. Prime event structures (PES) are well-known 
non-interleaving, partial-order semantics [7,8,16]. Let X be a set of actions. 
A PES over X is a structure E := (E,<,#,h) where E is a set of events, < C 
Ex E isa strict partial order called causality relation, # C Ex E isa symmetric, 
irreflexive conflict relation, and h: E — X is a labelling function. Causality 
represents the happens-before relation between events, and conflict between two 
events expresses that any execution includes at most one of them. Figure 2(b) 
shows a PES over N x A where causality is depicted by arrows, conflicts by dotted 
lines, and the labelling h is shown next to the events, e.g., 1 < 5,8 < 12,2 #8, 
and h(1) = (0,loc). The history of an event e, [e] := {e’ € E: e’ < e}, is the 
least set of events that need to happen before e. 

The notion of concurrent execution in a PES is captured by the concept 
of configuration. A configuration is a (partially ordered) execution of the system, 
i.e., a set C C E of events that is causally closed (if e € C, then fe] C C) 
and conflict-free (if e,e’ € C, then 7(e # e’)). In Fig. 2(b), the set {8,9,15} 
is a configuration, but {3} or {1,2,8} are not. We let conf (E) denote the set 
of all configurations of E£, and [e] := [e] U {e} the local configuration of e. In 
Fig. 2(b), [11] = {1,8,9,10, 11}. A configuration represents a set of interleavings 
over X. An interleaving is a sequence in X* that labels any topological sorting 
of the events in C. In Fig. 2(b), inter({1,8}) = {ab, ba} with a := (0,loc) and 
b:= (1,acq m). 

The extensions of C are the events not in C whose histories are included in C: 
ex(C) := {e€ E:e ECA [e] C C}. The enabled events of C are the extensions 
that can form a larger configuration: en(C) := {e € ex(C): CU {e} € conf(E)}. 
Finally, the conflicting extensions of C are the extensions that are not enabled: 
cez(C') := ex(C) \ en(C). In Fig. 2(b), ex({1,8}) = {2,9,15}, en({1,8}) = 
{9,15}, and cex({1,8}) = {2}. See [20] for more information on PES concepts. 


Parametric Unfolding Semantics. We recall the program PES semantics of [19, 20] 
(modulo notation differences). For a program P and any independence } on Mp 
we define a PES Up.» that represents the behavior of P, i.e., such that the 
interleavings of its set of configurations equals runs(Mp). 

Each event in Up. is defined by a canonical name of the form e := (a, H), 
where a € A is an action of Mp and H is a configuration of Up». Intuitively, e 
represents the action a after the history (or the causes) H. Figure 2(b) shows an 
example. Event 11 is ((0,acq m), {1,8,9,10}) and event 1 is ((0, 10c), Ø}. Note 
the inductive nature of the name, and how it allows to uniquely identify each 
event. We define the state of a configuration as the state reached by any of its 
interleavings. Formally, for C € conf(Up.o) we define state(C) as so if C = 0 
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Thread 0: Thread 1: Thread 2: (0, loc) J 1 _-[8] acam) [15] (2,acq m’) 
x := 0 lock (m) lock(m’) (0, aca m) [ZFA 9] {1,lec) 
lock (m) y :=1 z :=3 (0, 10c) [3 10] (1, re1 m),” [16] (2, 10c) 
if (y == 0) unlock(m) unlock(m’) I al 

unlock (m) (0,rel m) | 4 11] (0,aeq m) Pi 17 | (2, rel m’) 
else (1,acq m) |5 12 0, 106), 

lock(m’) 4 eaa 

E (1, loc) | 6 13 [ (0, acq m’) 18 | (0, acq m’) 

` | J | 
(1,rel m) | 7 14| (0, 10c) 19| (0, Loc) 

(a) (b) 


Fig. 2. (a) A program P; (b) its unfolding semantics Upp. 


and as state(o) for some a € inter(C) if C 4 Ø. Despite its appearance state(C) 
is well-defined because all sequences in inter(C) reach the same state, see [20] 
for a proof. 


Definition 1 (Unfolding). Given a program P and some independence rela- 
tion © on Mp := (S,—,A, so), the unfolding of P under $, denoted Upo, is 
the PES over A constructed by the following fixpoint rules: 


1. Start with a PES E := (E,<,#,h) equal to (0,0,0,0). 

2. Add a new event e := (a,C) to E for any configuration C € conf (E) and any 
action a E€ A if a is enabled at state(C) and =(a > h(e’)) holds for every 
<-mazimal event e in C. 

3. For any new e in E, update <, #, and h as follows: for every e € C, set 
e <e; for anye € E\C, sete’ #eife#e' and -(aQ h(e’)); set h(e) := a. 

4. Repeat steps 2 and 3 until no new event can be added to E; return E. 


Step 1 creates an empty PES with only one (empty) configuration. Step 2 inserts 
a new event (a, C) by finding a configuration C that enables an action a which is 
dependent with all causality-maximal events in C. In Fig. 2, this initially creates 
events 1, 8, and 15. For event 1 := ((0,1o0c),0), this is because action (0, loc) 
is enabled at state(@) = so and there is no <-maximal event in @ to consider. 
Similarly, the state of Cı := {1,8,9,10} enables action a, := (0,acq m), and 
both h(1) and h(10) are dependent with a; in Op. As a result (a1, C1) is an 
event (number 11). Furthermore, while a2 := (0,1loc) is enabled at state(C2), 
with Cy := {8,9,10}, az is independent of h(10) and (az, C2) is not an event. 

After inserting an event e := (a,C), Definition 1 declares all events in C 
causal predecessors of e. For any event e’ in E but not in [e] such that h(e’) 
is dependent with a, the order of execution of e and e’ yields different states. 
We thus set them in conflict. In Fig. 2, we set 2 # 8 because h(2) is dependent 
with h(8) and 2 ¢ [8] and 8 ¢ [2]. 


3 Unfolding-Based DPOR 


This section presents an algorithm that exhaustively explores all deadlock states 
of a given program (a deadlock is a state where no thread is enabled). 
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Algorithm 1. Unfolding-based POR exploration. See text for definitions. 


1 Initially, set U := Ø, 14 Function cexp(C) 

2 and call Explore(@, Ø, Ø). 15 R:=0 

3 Procedure Explore(C, D, A) 16 foreach event e € C of type acq 
4 | Add ex(C) to U 17 er := pt (e) 

5 if en(C)C D return 18 Emm pm(e) 

6 if A= 19 while =(em < e+) do 

7 | Choose e from en(C) \ D 20 Em ‘= pm(€m ) 

8 else 21 if (em < ez) break 
9 | Choose e from AN en(C) 22 Em i= pm(em) 

23 ê := (h(e), [et] U [em]) 

10 Explore (C U {e}, D, A \ {e}) Jå Add ê to R 
11 if 3J € Alt (C, DU {e}) L 
12 | Explore(C, DU {e}, J \ ©) 25 return R 
13 | U:=UNQo,p E 


For the rest of the paper, unless otherwise stated, we let P be a terminating 
program (i.e., runs(Mp) is a finite set of finite sequences) and $ an independence 
on Mp. Consequently, Up.» has finitely many events and configurations. 

Our POR algorithm (Algorithm 1) analyzes P by exploring the configurations 
of Up». It visits all C-maximal configurations of Up», which correspond to the 
deadlock states in reach(Mp), and organizes the exploration as a binary tree. 

Explore(C,D,A) has a global set U that stores all events of Up. discovered 
so far. The three arguments are: C, the configuration to be explored; D (for 
disabled), a set of events that shall never be visited (included in C) again; and 
A (for add), used to direct the exploration towards a configuration that conflicts 
with D. A call to Explore(C,D,A) visits all maximal configurations of Up,» 
which contain C and do not contain D, and the first one explored contains CU A. 

The algorithm first adds ex(C) to U. If C is a maximal configuration (i.e., 
there is no enabled event) then line 5 returns. If C is not maximal but en(C) C D, 
then all possible events that could be added to C have already been explored 
and this call was redundant work. In this case the algorithm also returns and 
we say that it has explored a sleep-set blocked (SSB) execution [1]. Algorithm 1 
next selects an event enabled at C, if possible from A (line 7 and 9) and makes a 
recursive call (left subtree) that explores all configurations that contain all events 
in CU {e} and no event from D. Since that call visits all maximal configurations 
containing C and e, it remains to visit those containing C but not e. At line 
11 we determine if any such configuration exists. Function Alt returns a set of 
configurations, so-called clues. A clue is a witness that a C-maximal configuration 
exists in Up.» which contains C and not D U {e}. 


Definition 2 (Clue). Let D and U be sets of events, and C a configuration 
such that CN D=9. A clue to D after C in U is a configuration J C U such 
that CU J is a configuration and DN J = 9. 
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Definition 3 (Alt function). Function Alt denotes any function such that 
Alt(B,F) returns a set of clues to F after B in U, and the set is non-empty if 
Up o has at least one maximal configuration C where B C C and CAF =0. 


When Alt returns a clue J, the clue is passed in the second recursive call 
(line 12) to “mark the way” (using set A) in the subsequent recursive calls at 
line 10, and guide the exploration towards the maximal configuration that J 
witnesses. Definition 3 does not identify a concrete implementation of Alt. It 
rather indicates how to implement Alt so that Algorithm 1 terminates and is 
complete (see below). Different PORs in the literature can be reframed in terms 
of Algorithm 1. SDPOR [1] uses clues that mark the way with only one event 
ahead (|J \ C| = 1) and can hit SSBs. Optimal DPORs [1,19] use size-varying 
clues that guide the exploration provably guaranteeing that any SSB will be 
avoided. 

Algorithm 1 is optimal when it does not explore a SSB. To make Algorithm 1 
optimal Alt needs to return clues that are alternatives [19], which satisfy stronger 
constraints. When that happens, Algorithm 1 is equivalent to the DPOR in [19] 
and becomes optimal (see [20] for a proof). 


Definition 4 (Alternative [19]). Let D and U be sets of events and C a 
configuration such that CN D =. An alternative to D after C in U is a clue J 
to D after C in U such that Ye € D: de’ EJ, e#e. 


Line 13 removes from U events that will not be necessary for Alt to find 
clues in the future. The events preserved, Qc,p := C U D U #(C U D), include 
all events in C U D as well as every event in U that is in conflict with some 
event in C U D. The preserved events will suffice to compute alternatives [19], 
but other non-optimal implementations of Alt could allow for more aggressive 
pruning. 

The C-maximal configurations of Fig. 2(b) are [7] U [17], [14], and [19]. Our 
algorithm starts at configuration C = Ø. After 10 recursive calls it visits C = 
[7]U[17]. Then it backtracks to C = {1}, calls Alt ({1}, {2}), which provides, e.g., 
J = {1,8}, and visits C = {1,8} with D = {2}. After 6 more recursive calls it 
visits C = [14], backtracks to C = [12], calls Alt ([12], {2, 13}]), which provides, 
e.g., J = {15}, and after two more recursive calls it visits C = [12] U {15} 
with D = {2,13}. Finally, after 4 more recursive calls it visits C = [19]. 

Finally, we focus on the correctness of Algorithm 1, and prove termination 
and soundness of the algorithm: 


Theorem 1 (Termination). Regardless of its input, Algorithm 1 always stops. 


Theorem 2 (Completeness). Let C be a C-marimal configuration of Up». 


Then Algorithm 1 calls Explore(C,D,A) at least once with C=C. 


4 Complexity 


This section presents complexity results about the only non-trival steps in Algo- 
rithm 1: computing ex(C) and the call to Alt(-,-). An implementation of 
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Alt(B,F) that systematically returns B would satisfy Definition 3, but would 
also render Algorithm 1 unusable (equivalent to a DFS in Mp). On the other 
hand the algorithm becomes optimal when Alt returns alternatives. Optimality 
comes at a cost: 


Theorem 3. Given a finite PES E, some configuration C € conf(E), and a 
set D C ex(C), deciding if an alternative to D after C exists in E is NP-complete. 


Theorem 3 assumes that € is an arbitrary PES. Assuming that E is the unfold- 
ing of a program P under ¢p does not reduce this complexity: 


Theorem 4. Let P be a program and U a causally-closed set of events from 
Up.op. For any configuration C C U and any D C ex(C), deciding if an alter- 
native to D after C exists in U is NP-complete. 


These complexity results lead us to consider (in next section) new approaches 
that avoid the NP-hardness of computing alternatives while still retaining their 
capacity to prune the search. 

Finally, we focus on the complexity of computing ex(C), which essentially 
reduces to computing cex(C), as computing en(C) is trivial. Assuming that E 
is given, computing cez(C’) for some C € conf (E) is a linear problem. However, 
for any realistic implementation of Algorithm1, € is not available (the very 
goal of Algorithm 1 is to find all of its events). So a useful complexity result 
about cexz(C’) necessarily refers to the orignal system under analysis. When £ is 
the unfolding of a Petri net [14], computing cex(C) is NP-complete: 


Theorem 5. Let N be a Petri net, t a transition of N, E the unfolding of N 
and C a configuration of E. Deciding if h~*(t) N cex(C) = Ø is NP-complete. 


Fortunately, computing cez(C) for programs is a much simpler task. Func- 
tion cexp(C’), shown in Algorithm1, computes and returns cez(C) when E 
is the unfolding of some program. We explain cexp(C) in detail in Sect. 5.3. 
But assuming that functions pt and pm can be computed in constant time, 
and relation < decided in O(log |C|), as we will show, clearly cexp works in 
time O(n? log n), where n := |C], as both loops are bounded by the size of C. 


5 New Algorithm for Computing Alternatives 


This section introduces a new class of clues, called k-partial alternatives. These 
can arbitrarily reduce the number of redundant explorations (SSBs) performed 
by Algorithm 1 and can be computed in polynomial time. Specialized data struc- 
tures and algorithms for k-partial alternatives are also presented. 


Definition 5 (k-partial alternative). Let U be a set of events, C C U a 
configuration, D C U a set of events, and k E€ N a number. A configuration J is 
a k-partial alternative to D after C if there is some D C D such that |D| = k 
and J is an alternative to D after C. 
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A k-partial alternative needs to conflict with only k (instead of all) events 
in D. An alternative is thus an co-partial alternative. If we reframe SDPOR in 
terms of Algorithm 1, it becomes an algorithm using singleton 1-partial alter- 
natives. While k-partial alternatives are a very simple concept, most of their 
simplicity stems from the fact that they are expressed within the elegant frame- 
work of PES semantics. Defining the same concept on top of sequential semantics 
(often used in the POR literature [1,2,9—11,23]), would have required much more 
complex device. 

We compute k-partial alternatives using a comb data structure: 


Definition 6 (Comb). Let A be a set. An A-comb c of size n € N is an 
ordered collection of spikes (s1,...,5n), where each spike s; E A* is a sequence 
of elements over A. Furthermore, a combination over c is any tuple (a1,...,@n) 
where a; € s; is an element of the spike. 


It is possible to compute k-partial alternatives (and by extension optimal 
alternatives) to D after C in U using a comb, as follows: 


1. Select k (or |D|, whichever is smaller) arbitrary events e),...,e, from D. 

2. Build a U-comb (s1,...,8,) of size k, where spike s; contains all events in U 
in conflict with e;. 

3. Remove from s; any event é such that either [é] UC is not a configuration or 
[JIN DFO. 

4. Find combinations (e},...,€),) in the comb satisfying =(e; # e4) for i # j. 

5. For any such combination the set J := [e1]U. . .U[e}] is a k-partial alternative. 


Step 3 guarantees that J is a clue. Steps 1 and 2 guarantee that it will conflict 
with at least k events from D. It is straightforward to prove that the procedure 
will find a k-partial alternative to D after C in U when an oo-partial alternative 
to D after C exists in U. It can thus be used to implement Definition 3. 

Steps 2, 3, and 4 require to decide whether a given pair of events is in conflict. 
Similarly, step 3 requires to decide if two events are causally related. Efficiently 
computing k-partial alternatives thus reduces to efficiently computing causality 
and conflict between events. 


5.1 Computing Causality and Conflict for PES Events 


In this section we introduce an efficient data structure for deciding whether two 
events in the unfolding of a program are causally related or in conflict. 

As in Sect.3, let P be a program, Mp its LTS semantics, and ®p its inde- 
pendence relation (defined in Sect. 2). Additionally, let € denote the PES Up.o, 
of P extended with a new event L that causally precedes every event in Up, sp. 

The unfolding € represents the dependency of actions in Mp through the 
causality and conflict relations between events. By definition of p we know 
that for any two events e,e’ € E: 


— If e and e’ are events from the same thread, then they are either causally 
related or in conflict. 
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— If e and e’ are lock/unlock operations on the same variable, then similarly 
they are either causally related or in conflict. 


This means that the causality/conflict relations between all events of one 
thread can be tracked using a tree. For every thread of the program we define 
and maintain a so-called thread tree. Each event of the thread has a corresponding 
node in the tree. A tree node n is the parent of another tree node n’ iff the event 
associated with n is the immediate causal predecessor of the event associated 
with n’. That is, the ancestor relation of the tree encodes the causality relations 
of events in the thread, and the branching of the tree represents conflict. Given 
two events e,e’ of the same thread we have that e < e’ iff a=(e # e’) iff the tree 
node of e is an ancestor of the tree node of e’. 

We apply the same idea to track causality/conflict between acq and rel 
events. For every lock l € £ we maintain a separate lock tree, containing a 
node for each event labelled by either (acq, l) or (rel, l}. As before, the ancestor 
relation in a lock tree encodes the causality relations of all events represented in 
that tree. Events of type acq/rel have tree nodes in both their lock and thread 
trees. Events for loc actions are associated to only one node in the thread tree. 

This idea gives a procedure to decide a causality /conflict query for two events 
when they belong to the same thread or modify the same lock. But we still need 
to decide causality and conflict for other events, e.g., loc events of different 
threads. Again by construction of ®p, the only source of conflict/causality for 
events are the causality/conflict relations between the causal predecessors of 
the two. These relations can be summarized by keeping two mappings for each 
event: 


Definition 7. Let e € E be an event of E. We define the thread mapping 
tmar: E xN > E as the only function that maps every pair (e,i) to the unique 
<-maximal event from thread i in |e], or L if [e] contains no event from thread i. 
Similarly, the lock mapping Imax: E x L — E maps every pair (e,l) to the 
unique <-mazimal event e' € |e] such that h(e’) is an action of the form (acq, l) 
or (rel,l), or L if no such event exists in |e]. 


The information stored by the thread and lock mappings enables us to decide 
causality and conflict queries for arbitrary pairs of events: 


Theorem 6. Let e,e' € E be two arbitrary events from resp. threads i and 7, 
with i # i. Then e < e' holds iff e < tmaz(e’,i). And e # e' holds iff there is 
some l E€ L such that lmax(e,l) # lmaz(e’, 1). 


As a consequence of Theorem 6, deciding whether two events are related by 
causality or conflict reduces to deciding whether two nodes from the same lock 
or thread tree are ancestors. 


5.2 Computing Causality and Conflict for Tree Nodes 


This section presents an efficient algorithm to decide if two nodes of a tree are 
ancestors. The algorithm is similar to a search in a skip list [18]. 
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Let (N,<,7r) denote a tree, where N is a set of nodes, < C N x N is the 
parent relation, and r € N is the root. Let d(n) be the depth of each node in 
the tree, with d(r) = 0. A node n is an ancestor of n’ if it belongs to the only 
path from r to n’. Finally, for a node n € N and some integer g € N such that 
g < d(n) let q(n, g) denote the unique ancestor n’ of n such that d(n’) = g. 

Given two distinct nodes n,n’ € N, we need to efficiently decide whether n is 
an ancestor of n’. The key idea is that if d(n) = d(n’), then the answer is clearly 
negative; and if the depths are different and w.l.o.g. d(n) < d(n’), then we have 
that n is an ancestor of n’ iff nodes n and n” := q(n’,d(n)) are the same node. 

To find n” from n’, a linear traversal of the branch starting from n’ would 
be expensive for deep trees. Instead, we propose to use a data structure similar 
to a skip list. Each node stores a pointer to the parent node and also a number 
of pointers to ancestor nodes at distances s', s?,s°,..., where s € N is a user- 
defined step. The number of pointers stored at a node n is equal to the number of 
trailing zeros in the s-ary representation of d(n). For instance, for s := 2 a node 
at depth 4 stores 2 pointers (apart from the pointer to the parent) pointing to 
the nodes at depth 4— s! = 2 and depth 4—s? = 0. Similarly a node at depth 12 
stores a pointer to the ancestor (at depth 11) and pointers to the ancestors 
at depths 10 and 8. With this algorithm computing q(n,g) requires traversing 
log(d(n) — g) nodes of the tree. 


5.3 Computing Conflicting Extensions 


We now explain how function cexp(C) in Algorithm 1 works. A call to cexp(C) 
constructs and returns all events in cer(C). The function works only when the 
PES being explored is the unfolding of a program P under the independence ¢ p. 

Owing to the properties of Upp, all events in cex(C) are labelled by acq 
actions. Broadly speaking, this is because only the actions from different threads 
that are co-enabled and are dependent create conflicts in Up p. And this is 
only possible for acq statements. For the same reason, an event labelled by 
a := (i, (acq,l)) exists in cex(C) iff there is some event e € C such that h(e) = a. 

Function cexp exploits these facts and the lock tree introduced in Sect. 5.1 
to compute cez(C). Intuitively, it finds every event e labelled by an (acq, l) 
statement and tries to “execute” it before the (rel,l) that happened before e 
(if there is one). If it can, it creates a new event ê with the same label as e. 

Function pt(e) returns the only immediate causal predecessor of event e in 
its own thread. For an acq/rel event e, function pm(e) returns the parent node 
of event e in its lock tree (or L if e is the root). So for an acq event it returns 
a rel event, and for a rel event it returns an acq event. 


6 Experimental Evaluation 


We implemented QPOR in a new tool called DPU (Dynamic Program Unfolder, 
available at https://github.com/cesaro/dpu/releases/tag/v0.5.2). DPU is a 
stateless model checker for C programs with POSIX threading. It uses the 
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LLVM infrastructure to parse, instrument, and JIT-compile the program, which 
is assumed to be data-deterministic. It implements k-partial alternatives (k is 
an input), optimal POR, and context-switch bounding [6]. 

DPU does not use data-races as a source of thread interference for POR. 
It will not explore two execution orders for the two instructions that exhibit a 
data-race. However, it can be instructed to detect and report data races found 
during the POR exploration. When requested, this detection happens for a user- 
provided percentage of the executions explored by POR. 


6.1 Comparison to SDPOR 


In this section we investigate the following experimental questions: (a) How 
does QPOR compare against SDPOR? (b) For which values of k do k-partial 
alternatives yield optimal exploration? 

We use realistic programs that expose complex thread synchronization pat- 
terns including a job dispatcher, a multiple-producer multiple-consumer scheme, 
parallel computation of 7, and a thread pool. Complex synchronizations pat- 
terns are frequent in these examples, including nested and intertwined critical 
sections or conditional interactions between threads based on the processed data, 
and provide means to highlight the differences between POR approaches and 
drive improvement. Each program contains between 2 and 8 assertions, often 
ensuring invariants of the used data structures. All programs are safe and have 
between 90 and 200 lines of code. We also considered the SV-COMP’17 bench- 
marks, but almost all of them contain very simple synchronization patterns, not 
representative of more complex concurrent algorithms. On these benchmarks 
QPOR and SDPOR perform an almost identical exploration, both timeout on 
exactly the same instances, and both find exactly the same bugs. 

In Table 1, we present a comparison between DPU and NIDHUGG [2], an 
efficient implementation of SDPOR for multithreaded C programs. We run k- 
partial alternatives with k € {1,2,3} and optimal alternatives. The number of 
SSB executions dramatically decreases as k increases. With k = 3 almost no 
instance produces SSBs (except MPC(4,5)) and optimality is achieved with 
k = 4. Programs with simple synchronization patterns, e.g., the PI benchmark, 
are explored optimally both with k = 1 and by SDPOR, while more complex 
synchronization patterns require k > 1. 

Overall, if the benchmark exhibits many SSBs, the run time reduces as k 
increases, and optimal exploration is the fastest option. However, when the 
benchmark contains few SSBs (cf., MPAT, P1, POKE), k-partial alternatives can 
be slightly faster than optimal POR, an observation inline with previous lit- 
erature [1]. Code profiling revealed that when the comb is large and contains 
many solutions, both optimal and non-optimal POR will easily find them, but 
optimal POR spends additional time constructing a larger comb. This suggests 
that optimal POR would profit from a lazy comb construction algorithm. 

DPU is faster than NIDHUGG in the majority of the benchmarks because it 
can greatly reduce the number of SSBs. In the cases where both tools explore the 
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Table 1. Comparing QPOR and SDPOR. Machine: Linux, Intel Xeon 2.4GHz. TO: 
timeout after 8min. Columns are: Th: nr. of threads; Confs: maximal configurations; 
Time in seconds, Memory in MB; SSB: Sleep-set blocked executions. N/A: analysis 
with lower k yielded 0 SSBs. 


Benchmark Dru (k=1) Dpu (k=2) Dpu (k=3) Dpvu (optimal) NIDHUGG 

Name Th Confs Time SSB Time SSB Time SSB Time Mem Time Mem SSB 
Disp(5,2) 8 137 0.8 1K 0.4 43 0.4 0 0.4 37 12 33 2K 
Disp (5,3 9 2K 5.4 11K 1.3 595 1.0 1 1.0 37 10.8 33 13K 
Disrte a) 10 15K 58.5 105K 16.4 6K 10.3 213 10.3 87 109 33 115K 
Disp(5,5 41 151K TO = 476 53K 280 2K 257 729 TO 33 = 
Disp(5,6 12 ? TO = TO = TO - TO 1131 TO 33 c 
Mpat(4) 9 384 0.5 (0) N/A N/A 0.5 37 0.6 33 0 
Mpat(5) 11 4K 2.4 (0) N/A N/A 2.7 37 1.8 33 0 
MPpaAtT(6) 13 46K 50.6 (0) N/A N/A 73.2 214 21.5 33 (0) 
MPAT oI 15 645K TO = TO = TO = TO 660 359 33 (0) 
Mpat(8) Wg % TO = TO = TO = TO 689 TO 33 = 
MPC (2,5 8 60 0.6 560 0.4 10) 0.4 38 2.0 34 3K 
MPCt3's3 9 3K 26.5 50K 3.0 3K 17 ie} 1.7 38 TOF 34 90K 
MPC(4,5) 10 314K TO s TO = 391 30K 296 239 TO 33 = 
MPC(5,5) 11 ? TO = TO = TO = TO 834 TO 34 = 
P1(5) 6 120 0.4 (0) N/A N/A 0.5 39 19.6 35 0 
P1(6 7 720 0.7 (0) N/A N/A 0.7 39 123 35 0 
P17 8 5K 3.5 (0) N/A N/A 4.0 45 TO 34 a 
P1(8) 9 40K 48.1 (0) N/A N/A 42.9 246 TO 34 = 
POL(7,3) 14 3K 48.5 72K 2.9 1K 1.9 6 1.9 39 74.1 33 90K 
POL(8,3) 15 4K 153 214K 5.5 3K 3.0 10 3.0 52 251 33 274K 
POL(9,3) 16 5K 464 592K 9.5 5K 4.8 15 4.8 73 TO 33 = 
Pot(10,3) aT. 7K TO - 17.2 9K 6.8 21: Tali 99 TO 33 = 
Po.(11,3) 18 10K TO - 27.2 12K 9.7 28 10.6 138 TO 33 a 
PoL(12,3) 19 12K TO - 46.3 20K 13.5 36 16.4 184 TO 33 = 


same set of executions, DPU is in general faster than NIDHUGG because it JIT- 
compiles the program, while NIDHUGG interprets it. All the benchmark in Table 1 
are data-race free, but NIDHUGG cannot be instructed to ignore data-races and 
will attempt to revert them. DPU was run with data-race detection disabled. 
Enabling it will incur in approximatively 10% overhead. In contrast with previous 
observations [1,2], the results in Table 1 show that SSBs can dramatically slow 
down the execution of SDPOR. 


6.2 Evaluation of the Tree-Based Algorithms 


We now evaluate the efficiency of our tree-based algorithms from Sect. 5 answer- 
ing: (a) What are the average/maximal depths of the thread/lock sequential 
trees? (b) What is the average depth difference on causality/conflict queries? (c) 
What is the best step for branch skip lists? We do not compare our algorithms 
against others because to the best of our knowledge none is available (other than 
a naive implementation of the mathematical definition of causality /conflict). 

We run Dpu with an optimal exploration over 15 selected programs 
from Table1l, with 380 to 204K maximal configurations in the unfolding. In 
total, the 15 unfoldings contain 246 trees (150 thread trees and 96 lock trees) 
with 5.2M nodes. Figure3 shows the average depth of the nodes in each tree 
(subfigure a) and the maximum depth of the trees (subfigure b), for each of the 
246 trees. 


368 H. T. T. Nguyen et al. 


70 T T T 80 T T T T T 
60 Locks M 70 Locks 4 
50 Threads ------ | 60 | Threads ------ A 
£ r < 50 i 4 
4 a i 
B a : © 40 ra 
A 30 ] Q 30 A < 
20 oOo] 20 o 
10 a 4 10 ee 4 
Mena-p---f i o SEE EEE eee I 
O 20 40 60 80 100120140160 O 20 40 60 80 100120140160 
Tree Tree 
(a) Average depth of the tree nodes (b) Maximum depth of the trees 
40% 
30% 45% 
20% 30% 
10% | 15% l 
0% = = = ök Ei EE S 
12345 6 7 8 9 10111213141516 1234567 8 9 1011121314151617181920 


c) Depth-distance frequency on causality d) Depth-distance frequency on conflict 
P 
queries queries 


Fig. 3. (a), (b) Depths of trees; (c), (d) frequency of depth distances 


While the average depth of a node is 22.7, as much as 80% of the trees have a 
maximum depth of less than 8 nodes, and 90% of them less than 16 nodes. The 
average of 22.7 is however larger because deeper trees contain proportionally 
more nodes. The depth of the deepest node of every tree was between 3 and 77. 

We next evaluate depth differences in the causality and conflict queries over 
these trees. Figure 3(a) and (b) respectively show the frequency of various depth 
distances associated to causality and conflict queries made by optimal POR. 

Surprisingly, depth differences are very small for both causality and conflict 
queries. When deciding causality between events, as much as 92% of the queries 
were for tree nodes separated by a distance between 1 and 4, and 70% had a 
difference of 1 or 2 nodes. This means that optimal POR, and specifically the 
procedure that adds ex(C) to the unfolding (which is the main source of causality 
queries), systematically performs causality queries which are trivial with the 
proposed data structures. The situation is similar for checking conflicts: 82% of 
the queries are about tree nodes whose depth difference is between 1 and 4. 

These experiments show that most queries on the causality trees require very 
short walks, which strongly drives to use the data structure proposed in Sect. 5. 
Finally, we chose a (rather arbitrary) skip step of 4. We observed that other 
values do not significantly impact the run time/memory consumption for most 
benchmarks, since the depth difference on causality/conflict requests is very low. 


6.3 Evaluation Against the State-of-the-Art on System Code 


We now evaluate the scalability and applicability of DPU on five multithreaded 
programs in two Debian packages: blktrace [5], a block layer I/O tracing mech- 
anism, and mafft [12], a tool for multiple alignment of amino acid or nucleotide 
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sequences. The code size of these utilities ranges from 2K to 40K LOC, and mafft 
is parametric in the number of threads. 


We compared DPU against Table 2. Comparing DPU with Maple (same 
MAPLE [24], a state-of-the-art machine). LOC: lines of code; Execs: nr. of exe- 
testing tool for multithreaded cutions; R: safe or unsafe. Other columns as 
programs, as the top ranked veri- before. Timeout: 8 min. 
fication tools from SV-COMP?’17 


S bs Benchmark DPU MAPLE 
are still unable to cope with f - 
3 Name LOC Th Time Ex R Time Ex R 
such large and complex multi- ae e as a ee 
DD é A 
threaded code. Unfortunately we App(4) 40K 5 25.5 24 U 34.5 24 U 
z : App(6) 40K 7 48.1 720 U TO 316 U 
could not compare against NID- ADDÌ8J 40K 9 TO 14K U TO 329 U 
HUGG because it cannot deal with ADD(10) 40K _ 11 T0- AAK UV TO: 295. U 
cii ; BLK(5) 2K 2 09 1S 46 18 
N (abundant) C-library calls in eo oe ae ere ee 
these programs. BLK(18) 2K 2 1.0 180 S TO 105 8 
prog BLK(20) 2K 2 1.5 1147 S TO 106 S 
Table2 presents our exper- BLK(22) 2K 2 2.6 5424 S TO 108 S 
imental results. We use DPU I A 2 i00 20S POOS 8 
. . . DND(2,4) 16K 3 11.1 80 U 122 80 U 
with optimal exploration and the DNpl4 2) 16K 5 118 96 S 151 96 S 
modified version of MAPLE used T oS 1K : ms a y T a 2 
in [22]. To test the effectiveness 
i MDL(1,4) 38K 7 26.1 1U 14 140 
of both approaches in state space MDL(2,2) 38K 5 29.2 9 U 13.3 9 U 
; MDL(2,3) 38K 5 46.2 576 U TO 304 U 
coverage and bug finding, we MDL(3,2) 38K 7 31.1 256 U 402 256 U 
introduce bugs in 4 of the bench- MDL(4,3) 38K__9 70 40k Vn T0 aau 
; : PLa(1,5) 44K 2 22.8 4 0 17 #140 
marks (ADD,DND,MDL,PLA). For Prat? 41K 3 37.2 80 U 142.4 80 U 
the safe benchmark BLK, we PLaA(4,3) 414K 5 160.5 1368 U TO 266 U 
PLA(6.3) 41K 7 TO 4580 U TO 269 U 


perform exhaustive state-space 
exploration using MAPLE’s DFS 
mode. On this benchmark, DPU outperfors MAPLE by several orders of mag- 
nitude: DPU explores up to 20K executions covering the entire state space in 
10s, while MAPLE only explores up to 108 executions in 8 min. 

For the remaining benchmarks, we use the random scheduler of MAPLE, con- 
sidered to be the best baseline for bug finding [22]. First, we run DPU to retrieve 
a bound on the number of random executions to answer whether both tools are 
able to find the bug within the same number of executions. MAPLE found bugs 
in all buggy programs (except for one variant in ADD) even though DPU greatly 
outperforms and is able to achieve much more state space coverage. 


6.4 Profiling a Stateless POR 


In order to understand the cost of each component of the algorithm, we pro- 
file DPU on a selection of 7 programs from Table 1. DPU spends between 30% 
and 90% of the run time executing the program (65% in average). The remaining 
time is spent computing alternatives, distributed as follows: adding events to the 
event structure (15% to 30%), building the spikes of a new comb (1% to 50%), 
searching for solutions in the comb (less than 5%), and computing conflicting 
extensions (less than 5%). Counterintuitively, building the comb is more expen- 
sive than exploring it, even in the optimal case. Filling the spikes seems to be 
more memory-intensive than exploring the comb, which exploits data locality. 
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7 Conclusion 


We have shown that computing alternatives in an optimal DPOR exploration is 
NP-complete. To mitigate this problem, we introduced a new approach to com- 
pute alternatives in polynomial time, approximating the optimal exploration 
with a user-defined constant. Experiments conducted on benchmarks including 
Debian packages show that our implementation outperforms current verification 
tools and uses appropriate data structures. Our profiling results show that run- 
ning the program is often more expensive than computing alternatives. Hence, 
efforts in reducing the number of redundant executions, even if significantly 
costly, are likely to reduce the overall execution time. 
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Abstract. We address the problem of verifying message passing pro- 
grams, defined as a set of processes communicating through unbounded 
FIFO buffers. We introduce a bounded analysis that explores a spe- 
cial type of computations, called k-synchronous. These computations can 
be viewed as (unbounded) sequences of interaction phases, each phase 
allowing at most k send actions (by different processes), followed by a 
sequence of receives corresponding to sends in the same phase. We give 
a procedure for deciding k-synchronizability of a program, i.e., whether 
every computation is equivalent (has the same happens-before relation) 
to one of its k-synchronous computations. We show that reachability over 
k-synchronous computations and checking k-synchronizability are both 
PSPACE-complete. 


1 Introduction 


Communication with asynchronous message passing is widely used in concurrent 
and distributed programs implementing various types of systems such as cache 
coherence protocols, communication protocols, protocols for distributed agree- 
ment, device drivers, etc. An asynchronous message passing program is built as 
a collection of processes running in parallel, communicating asynchronously by 
sending messages to each other via channels or message buffers. Messages sent 
to a given process are stored in its entry buffer, waiting for the moment they 
will be received by the process. Sending messages is not blocking for the sender 
process, which means that the message buffers are supposed to be of unbounded 
size. 

Such programs are hard to get right. Asynchrony introduces a tremendous 
amount of new possible interleavings between actions of parallel processes, and 
makes it very hard to apprehend the effect of all of their computations. Due 
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to this complexity, verifying properties (invariants) of such systems is hard. In 
particular, when buffers are ordered (FIFO buffers), the verification of invariants 
(or dually of reachability queries) is undecidable even when each process is finite- 
state [10]. 

Therefore, an important issue is the design of verification approaches that 
avoid considering the full set of computations to draw useful conclusions about 
the correctness of the considered programs. Several such approaches have been 
proposed including partial-order techniques, bounded analysis techniques, etc., 
e.g., [4,6, 13, 16,23]. Due to the hardness of the problem and its undecidability, 
these techniques have different limitations: either applicable only when buffers 
are bounded (e.g., partial-order techniques), or limited in scope, or do not provide 
any guarantees of termination or insight about the completeness of the analysis. 

In this paper, we propose a new approach for the analysis and verification of 
asynchronous message-passing programs with unbounded FIFO buffers, which 
provides a decision procedure for checking state reachability for a wide class of 
programs, and which is also applicable for bounded-analysis in the general case. 

We first define a bounding concept for prioritizing the enumeration of pro- 
gram behaviors. This concept is guided by our conviction that the behaviors of 
well designed programs can be seen as successions of bounded interaction phases, 
each of them being a sequence of send actions (by different processes), followed 
by a sequence of receive actions (again by different processes) corresponding to 
send actions belonging to the same interaction phase. For instance, interaction 
phases corresponding to rendezvous communications are formed of a single send 
action followed immediately by its corresponding receive. More complex inter- 
actions are the result of exchanges of messages between processes. For instance 
two processes can send messages to each other, and therefore their interaction 
starts with two send actions (in any order), followed by the two corresponding 
receive actions (again in any order). This exchange schema can be generalized 
to any number of processes. We say that an interaction phase is k-bounded, for a 
given k > 0, if its number of send actions is less than or equal to k. For instance 
rendezvous interactions are precisely 1-bounded phases. In general, we call k- 
exchange any k-bounded interaction phase. Given k > 0, we consider that a 
computation is k-synchronous if it is a succession of k-exchanges. It can be seen 
that, in k-synchronous computations the sum of the sizes of all messages buffers 
is bounded by k. However, as it will be explained later, boundedness of the mes- 
sages buffers does not guarantee that there is a k such that all computations are 
k-synchronous. 

Then, we introduce a new bounded analysis which for a given k, consid- 
ers only computations that are equivalent to k-synchronous computations. The 
equivalence relation on computations is based on a notion of trace corresponding 
to a happens-before relation capturing the program order (the order of actions in 
the code of a process) and the precedence order between sends and their corre- 
sponding receives. Two computations are equivalent if they have the same trace, 
i.e., they differ only in the order of causally independent actions. We show that 
this analysis is PSPACE-complete when processes have a finite number of states. 
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An important feature of our bounding concept is that it is possible to 
decide its completeness for systems composed of finite-state processes, but with 
unbounded message buffers: For any given k, it is possible to decide whether 
every computation of the program (under the asynchronous semantics) is equiv- 
alent to (i.e., has the same trace as) a k-synchronous computation of that pro- 
gram. When this holds, we say that the program is k-synchronizable!. Knowing 
that a program is k-synchronizable allows to conclude that an invariant holds 
for all computations of the program if no invariant violations have been found 
by its k-bounded exchange analysis. Notice that k-synchronizability of a pro- 
gram does not imply that all its behaviours use bounded buffers. Consider for 
instance a program with two processes, a producer that consists of a loop of 
sends, and a consumer that consists of a loop of receives. Although there are 
computations where the entry buffer of the consumer is arbitrarily large, the 
program is 1-synchronizable because all its computations are equivalent to com- 
putations where each message sent by the producer is immediately received by 
the consumer. 

Importantly, we show that checking k-synchronizability of a program, with 
possibly infinite-state processes, can be reduced in linear time to checking state 
reachability under the k-synchronous semantics (i.e., without considering all 
the program computations). Therefore, for finite-state processes, checking k- 
synchronizability is PSPACE and it is possible to decide invariant properties 
without dealing with unbounded message buffers when the programs are k- 
synchronizable (the overall complexity being PSPACE). 

Then, a method for verifying asynchronous message passing programs can 
be defined, based on iterating k-bounded analyses with increasing value of k, 
starting from k = 1. If for some k, a violation (i.e., reachability of an error state) 
is detected, then the iteration stops and the conclusion is that the program 
is not correct. On the other hand, if for some k, the program is shown to be 
k-synchronizable and no violations have been found, then again the iteration 
terminates and the conclusion is that the program is correct. 

However, it is possible that the program is not k-synchronizable for any k. In 
this case, if the program is correct then the iteration above will not terminate. 
Thus, an important issue is to determine whether a program is synchronizable, 
i.e., there exists a k such that the program is k-synchronizable. This problem is 
hard, and we believe that it is undecidable, but we do not have a formal proof. 

We have applied our theory to a set of nontrivial examples, two of them 
being presented in Sect. 2. All the examples are synchronizable, which confirms 
our conviction that non-synchronizability should correspond to an ill-designed 
system (and therefore it should be reported as an anomaly). 

An extended version of this paper with missing proofs can be found at [9]. 


' A different notion of synchronizability has been defined in [4] (see Sect. 8). 


On the Completeness of Verifying Message Passing Programs 375 
2 Motivating Examples 


We provide in this section examples illustrating the relevance and the appli- 
cability of our approach. Figure 1 shows a commit protocol allowing a client to 
update a memory that is replicated in two processes, called nodes. The access to 
the nodes is controlled by a manager. Figure 2 shows an execution of this pro- 
tocol. This system is 1-synchronizable, i.e., every execution is equivalent to one 
where only rendezvous communication is used. Intuitively, this holds because 
mutually interacting components are never in the situation where messages sent 
from one to the other are crossing messages sent in the other direction (i.e., the 
components are “talking” to each other at the same time). For instance, the 
execution in Fig. 2 is 1-synchronizable because its conflict graph (shown in the 
same figure) is acyclic. Nodes in the conflict graph are matching send-receive 
pairs (numbered from 1 to 6 in the figure), and edges correspond to the program 
order between actions in these pairs. The label of an edge records whether the 
actions related by program order are sends or receives, e.g., the edge from 1 to 
2 labeled by RS represents the fact that the receive of the send-receive pair 1 


Manager m: 


rec(m,update), send(m,n1,update) send(m,n2,update) 
lero REY ne Sc ) 
rec(m,OK) 
send(m,c,OK) rec(m,OK) 
oo" 


Client c: Node n1: Node n2: 


ee ca aca" as 


send(c,m,update) send(n1,m,OK) send(n2,m,OK) 


Fig. 1. A distributed commit protocol. Each process is defined as a labeled transition 
system. Transitions are labeled by send and receive actions, e.g., send(c, m, update) is 
a send from the client c to the manager m with payload update. Similarly, rec(c, OK) 
denotes process c receiving a message OK. 
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Fig. 2. An execution of the distributed commit protocol and its conflict graph. 
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is before the send of the send-receive pair 2, in program order. For the moment, 
these labels should be ignored, their relevance will be discussed in Sect. 5. The 
conflict graph being acyclic means that matching pairs of send-receive actions 
are “serializable”, which implies that this execution is equivalent to one where 
every send is immediately followed by the matching receive (as in rendezvous 
communication). 

Although the message buffers are bounded in all the computations of the 
commit protocol, this is not true for every 1-synchronizable system. There are 
asynchronous computations where buffers have an arbitrarily big size, which 
are equivalent to synchronous computations. This is illustrated by a (family of) 
computations shown in Fig. 4a of the system modeling an elevator described in 
Fig. 3 (a simplified version of the system described in [14]). This system con- 
sists of three processes: User models the user of the elevator, Elevator models the 
elevator’s controller, and Door models the elevator’s door which reacts to com- 
mands received from the controller. The execution in Fig.4a models an inter- 
action where the user sends an unbounded number of requests for closing the 
door, which generates an unbounded number of messages in the entry buffer of 
Elevator. These computations are 1-synchronizable since they are equivalent to a 
1-synchronous computation where Elevator receives immediately every message 
sent by User. This is witnessed by the acyclicity of the conflict graph of this 
computation (shown on the right of the same figure). It can be checked that the 
elevator system without the dashed edge is a 1-synchronous system. 

Consider now a slightly different version of the elevator system where the 
transition from Stopping2 to Opening2 is moved to target Opening1 instead (see 
the dashed transition in Fig. 3). It can be seen that this version reaches exactly 
the same set of configurations (tuples of process local states) as the previous 
one. Indeed, modifying that transition enables Elevator to send a message open 
to Door, but the latter can only be at StopDoor, OpenDoor, or ResetDoor at this 
point, and therefore it can (maybe after sending doorStopped and doorOpened) 
receive at state ResetDoor the message open. However, receiving this message 
doesn’t change Door’s state, and the set of reachable configurations of the system 
remains the same. This version of the system is not l-synchronizable as it is 
shown in Fig. 4b: once the doorStopped message sent by Door is received by 
Elevator”, these two processes can send messages to each other at the same time 
(the two send actions happen before the corresponding receives). This mutual 
interaction consisting of 2 parallel send actions is called a 2-erchange and it is 
witnessed by the cycle of size 2 in the execution’s conflict graph (shown on the 
right of Fig. 4b). In general, it can be shown that every execution of this version 
of the elevator system has a conflict graph with cycles of size at most 2, which 
implies that it is 2-synchronizable (by the results in Sect. 5). 


2 Door sends the message from state StopDoor, and Elevator is at state Stopping2 before 
receiving the message. 
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3 Message Passing Systems 


We define a message passing system as the composition of a set of processes that 
exchange messages, which can be stored in FIFO buffers before being received 
(we assume one buffer per process, storing incoming messages from all the other 
processes). Each process is described as a state machine that evolves by execut- 
ing send or receive actions. An execution of such a system can be represented 
abstractly using a partially-ordered set of events, called a trace. The partial order 
in a trace represents the causal relation between events. We show that these 
systems satisfy causal delivery, i.e., the order in which messages are received 
by a process is consistent with the causal relation between the corresponding 
sendings. 


User u: Door d: 


send(u,e,openDoor) rec(d,reset) reo(d,reset) eae 


rec(d,open) send(d,e,doorOpened) 
ud OPCNDOOl Seg ReSetDoor 


rec(d,stop) 


a 


send(u,e,closeDoor) — 
rec(d,close) 
StopDoor 
Elevator e: 


rec(e,closeDoor) 


send(e,d,reset) rec(e,openDoor) 5 send(e,d,open) p rec(e,doorOpened) send(e,d,reset) x 
Closed1 g Closed2 iq Openingi Sd Opening2 4 Opened } ig Closing1 


ve 
rec(e,doorClose: y 
ee ‘ps /- PEE send(e,d,close) 
Closing2 


rec(e,doorClosed) rec(e,openDoor) 


mn send(e,d,stop) 
=$ Stopping2 Sau topping! 


rec(e,openDoor) 


Fig. 3. A system modeling an elevator. 


We fix sets P and V of process ids and message payloads, and sets S = 
{send(p, q, v) : p,q € P,v € V} and R = {rec(q, v) : q E€ P,v € V} of send actions 
and receive actions. Each send send(p, q, v) combines two process ids p, q denoting 
the sender and the receiver of the message, respectively, and a message payload 
v. Receive actions specify the process q receiving the message, and the message 
payload v. The process executing an action a € SU R is denoted proc(a), i.e., 
proc(a) = p for all a = send(p,q,v) or a = rec(p,v), and the destination q of 
a send s = send(p,q,v) E€ S is denoted dest(s). The set of send, resp., receive, 
actions a of process p, i.e., with proc(a) = p, is denoted by Sp, resp., Rp. 

A message passing system is a tuple S = ((Lp, õp, IS) | p € P) where Ly is the 
set of local states of process p, 6, C L x (Sp U Rp) x L is a transition relation 
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describing the evolution of process p, and H is the initial state of process p. 
Examples of message passing systems can be found in Figs. 1 and 3. 

We fix a set M of message identifiers, and the sets Siq = {s; : s € S,i E M} 
and Ria = {ri :r € R,i € M} of indexed actions. Message identifiers are used 
to pair send and receive actions. We denote the message id of an indexed 
send/receive action a by msg(a). Indexed send and receive actions s € Siq and 
r € Ria are matching, written sr, when msg(s) = msg(r). 

A configuration c = (l, b} is a vector l of local states along with a vector b of 
message buffers (sequences of message payloads tagged with message identifiers). 
The transition relation = (with label a € Sia U Rig) between configurations is 
defined as expected. Every send action enqueues the message into the destina- 
tion’s buffer, and every receive dequeues a message from the buffer. An execution 
of a system S under the asynchronous semantics is a sequence of indexed actions 
which corresponds to applying a sequence of transitions from the initial configu- 
ration (where processes are in their initial states and the buffers are empty). Let 
asEx(S) denote the set of these executions. Given an execution e, a send action 
s in e is called an unmatched send when e contains no receive action r such that 
smr. An execution e is called matched when it contains no unmatched send. 


Traces. Executions are represented using traces which are sets of indexed actions 
together with a program order relating every two actions of the same process 
and a source relation relating a send with the matching receive (if any). 


User Elevator Door Conflict Graph: 
i closeDoor 
: 1 reset Elevator Door Conflict Graph: 
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openDoor 


open 
ed 
doorOpened 
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(b) A computation with a 2-exchange. 
(a) A 1-synchronizable execution. 


Fig. 4. Executions of the elevator. 


Formally, a trace is a tuple t = (A, po, src) where A C Sia U Ria, po C A? 
defines a total order between actions of the same process, and src C Sia X Ria is 
a relation s.t. src(a, a’) iff aHa’. The trace tr(e) of an execution e is (A, po, src) 
where A is the set of all actions in e, po(a, a’) iff proc(a) = proc(a’) and a occurs 
before a’ in e, and src(a, a’) iff a4a’. Examples of traces can be found in Figs. 2 


On the Completeness of Verifying Message Passing Programs 379 


and 4. The union of po and src is acyclic. Let asTr(S) = {tr(e) : e € asEx(S)} 
be the set of traces of S under the asynchronous semantics. 

Traces abstract away the order of non-causally related actions, e.g., two sends 
of different processes that could be executed in any order. Two executions have 
the same trace when they only differ in the order between such actions. Formally, 
given an execution e = e1-a-a’-e with tr(e) = (A, po, src), where e1, e2 € (SiaU 
Ria)* anda,a’ E€ S;q Ria, we say that e’ = e;-a’-a-eg is derived from e by a valid 
swap iff (a, a’) Z poUsre. A permutation e’ of an execution e is conflict-preserving 
when e’ can be derived from e through a sequence of valid swaps. For simplicity, 
whenever we use the term permutation we mean conflict-preserving permutation. 
For instance, a permutation of send;(pi,q,_) sendə(p2, q, -) recı(q, -) rece(q, -) 
is send;(pi,q,-) reci(q,-) sendg(po,q,-) rece(g,-) and a permutation of the 
execution send, (p1, q1, -) send2(pa, q2, -) rece (ge, -) recı (q1, -) is send, (p1, q1, -) 
rec (qi, -) senda (p2, q2, -) rece(qa, -). 

Note that the set of executions having the same trace are permutations of one 
another. Also, asystem S cannot distinguish between permutations of executions 
or equivalently, executions having the same trace. 


Causal Delivery. The asynchronous semantics ensures a property known as 
causal delivery, which intuitively, says that the order in which messages are 
received by a process q is consistent with the “causal” relation between them. 
Two messages are causally related if for instance, they were sent by the same 
process p or one of the messages was sent by a process p after the other one was 
received by the same process p. This property is ensured by the fact that the 
message buffers have a FIFO semantics and a sent message is instantaneously 
enqueued in the destination’s buffer. For instance, the trace (execution) on the 
left of Fig.5 satisfies causal delivery. In particular, the messages vl and v3 are 
causally related, and they are received in the same order by q2. On the right of 
Fig. 5, we give a trace where the messages vı and v3 are causally related, but 
received in a different order by q2, thus violating causal delivery. This trace is 
not valid because the message vl would be enqueued in the buffer of q2 before 
send(p, q1, v2) is executed and thus, before send(q1, q2, v3) as well. 


Processp  Processqi Process q2 Processp  Processq1 Process q2 
send(p,q2,v1) send(p,q2,v1) 


send(p,q1,v2) rec(q2,v1) send(p,q1,v2) 


rec(q1,v2) 


serjd(qi,q2,v3 
serjd(q1,q2,v3) ee rec(q2,v3) 
rec(q2,v3) a 


rec(q2,v1) 


Fig. 5. A trace satisfying causal delivery (on the left) and a trace violating causal 
delivery (on the right). 
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Processp Process qi Process q2 


send(p,q2,v1) B(q2) ={p} 
send(p,q1,v2) 
B(q2) ={p, q1} 


serjd(q1,q2,v3) 


rec(q2,v3) 


Fig. 6. An execution of the 1-synchronous semantics. 


Formally, for a trace t = (A, po, src), the transitive closure of poUsrc, denoted 
by ~+, is called the causal relation of t. For instance, for the trace t on the left of 
Fig. 5, we have that send(p, q2, vl) ~+ send(q1, q2, v3). A trace t satisfies causal 
delivery if for every two send actions sı and s2 in A, 


(sı ~t S2 À dest(s1) = dest(s2)) = (Arz E€ A. S2 H r2)V 
(arı, r2 € A. S1 11 A S2 H r2 A (r2,r1) Æ po) 


It can be easily proved that every trace t € asTr(S) satisfies causal delivery. 


4 Synchronizability 


We define a property of message passing systems called k-synchronizability as 
the equality between the set of traces generated by the asynchronous semantics 
and the set of traces generated by a particular semantics called k-synchronous. 

The k-synchronous semantics uses an extended version of the standard 
rendez-vous primitive where more than one process is allowed to send a mes- 
sage and a process can send multiple messages, but all these messages must be 
received before being allowed to send more messages. This primitive is called 
k-exchange if the number of sent messages is at most k. For instance, the exe- 
cution send;(pj,p2,_) send2(p2,p1,-) recı(p2,-) reca(p1,_) is an instance of a 
2-exchange. To ensure that the k-synchronous semantics is prefix-closed (if it 
admits an execution, then it admits all its prefixes), we allow messages to be 
dropped during a k-exchange transition. For instance, the prefix of the previous 
execution without the last receive (rec2(p1, -)) is also an instance of a 2-exchange. 
The presence of unmatched send actions must be constrained in order to ensure 
that the set of executions admitted by the k-synchronous semantics satisfies 
causal delivery. Consider for instance, the sequence of 1-exchanges in Fig.6, a 
l-exchange with one unmatched send, followed by two 1-exchanges with match- 
ing pairs of send/receives. The receive action (rec(g2, v3)) pictured as an empty 
box needs to be disabled in order to exclude violations of causal delivery. To 
this, the semantics tracks for each process p a set of processes B(p) from which 
it is forbidden to receive messages. For the sequence of 1-exchanges in Fig. 6, 
the unmatched send(p, ¢g2,v1) disables any receive by q2 of a message sent by 
p (otherwise, it will be even a violation of the FIFO semantics of g2’s buffer). 
Therefore, the first 1-exchange results in B(q2) = {p}. The second 1-exchange 
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(the message from p to q1) forbids q2 to receive any message from q1. Otherwise, 
this message will be necessarily causally related to vl, and receiving it will lead 
to a violation of causal delivery. Therefore, when reaching send(q1, q2, v3) the 
receive rec(q2, v3) is disabled because q1 € B(q2). 


k-EXCHANGE 
e € Sig Rig lel <2-k 
,€) > (l’,b), for some b Vs,r€e. smr => proc(s) g B(dest(s)) 
B'(q) = B(q) U {p : 3s € eN Sia. (( Ar € e. sur) Ap = proc(s) A q = dest(s)) 
V(proc(s) € B(q) A dest(s) = p)} 


(L, B) = k(l, B’) 


Fig. 7. The synchronous semantics. Above, € is a vector where all the components are 
€, and & is the transition relation of the asynchronous semantics. 


Formally, a configuration c’ = (l, B) in the synchronous semantics is a vector 
L of local states together with a function B : P — 2". The transition relation 
=, is defined in Fig.7. A k-EXCHANGE transition corresponds to a sequence 
of transitions of the asynchronous semantics starting from a configuration with 
empty buffers. The sequence of transitions is constrained to be a sequence of 
at most k sends followed by a sequence of receives. The receives are enabled 
depending on previous unmatched sends as explained above, using the function 
B. The semantics defined by =x is called the k-synchronous semantics. 

Executions and traces are defined as in the case of the asynchronous seman- 
tics, using =>, for some fixed k instead of —. The set of executions, resp., traces, 
of S under the k-synchronous semantics is denoted by sEx;(S), resp., sTrz(S). 
The executions in sEx;,(S) and the traces in sTr,(S) are called k-synchronous. 

An execution e such that tr(e) is k-synchronous is called k-synchronizable. 
We omit k when it is not important. The set of executions generated by a system 
S under the k-synchronous semantics is prefix-closed. Therefore, the set of its 
k-synchronizable executions is prefix-closed as well. Also, k-synchronizable and 
k-synchronous executions are undistinguishable up to permutations. 


Definition 1. A message passing system S is called k-synchronizable when 
asTr(S) = sTr;,(S). 


It can be easily proved that k-synchronizable systems reach exactly the 
same set of local state vectors under the asynchronous and the k-synchronous 
semantics. Therefore, any assertion checking or invariant checking problem for 
a k-synchronizable system S can be solved by considering the k-synchronous 
semantics instead of the asynchronous one. This holds even for the problem of 
detecting deadlocks. Therefore, all these problems become decidable for finite- 
state k-synchronizable systems, whereas they are undecidable in the general case 
(because of the FIFO message buffers). 


382 A. Bouajjani et al. 


5 Characterizing Synchronous Traces 


We give a characterization of the traces generated by the 
k-synchronous semantics that uses a notion of conflict- 
graph similar to the one used in conflict serializability [27]. 
The nodes of the conflict graph correspond to pairs of 
matching actions (a send and a receive) or to unmatched 
sends, and the edges represent the program order relation 


Conflict Graph: between the actions represented by these nodes. 
1-84 For instance, an execution with an acyclic conflict 
= = graph, e.g., the execution in Fig. 2, is “equivalent” to an 
| z execution where every receive immediately follows the 
2—3 


matching send. Therefore, it is an execution of the 1- 
synchronous semantics. For arbitrary values of k, the con- 
flict graph may contain cycles, but of a particular form. 
For instance, traces of the 2-synchronous semantics may 
contain a cycle of size 2 like the one in Fig. 4(b). More generally, we show that 
the conflict graph of a k-synchronous trace cannot contain cycles of size strictly 
bigger than k. However, this class of cycles is not sufficient to precisely charac- 
terize the k-synchronous traces. Consider for instance the trace on top of Fig. 8. 
Its conflict-graph contains a cycle of size 4 (shown on the bottom), but the trace 
is not 4-synchronous. The reason is that the messages tagged by 1 and 4 must 
be sent during the same exchange transition, but receiving message 4 needs that 
the message 3 is sent after 2 is received. Therefore, it is not possible to schedule 
all the send actions before all the receives. Such scenarios correspond to cycles in 
the conflict graph where at least one receive is before a send in the program order 
(witnessed by the edge labeled by RS). We show that excluding such cycles, in 
addition to cycles of size strictly bigger than k, is a precise characterization of 
k-synchronous traces. 

The conflict-graph of a trace t = (A,po,src) is the labeled directed graph 
CG, = (V, E, lg) where: (1) the set of nodes V includes one node for each pair of 
matching send and receive actions, and each unmatched send action in t, and (2) 
the set of edges E is defined by: (v, vu’) € E’ iff there exist actions a € act(v) and 
a’ € act(v’) such that (a,a’) € po (where act(v) is the set of actions of trace t 
corresponding to the graph node v). The label of the edge (v, v’) records whether 
a and a’ are send or receive actions, i.e., for all X,Y € {S, R}, XY € &(v, wv’) iff 
a € Xia and a’ € Y;a. 

A direct consequence of previous results on conflict serializability [27] is that a 
trace is 1-synchronous whenever its conflict-graph is acyclic. A cycle of a conflict 
graph CG; is called bad when it contains an edge labeled by RS. Otherwise, it 
is called good. The following result is a characterization of k-synchronous traces. 


Fig. 8. A trace and its 
conflict graph. 


Theorem 1. A trace t satisfying causal delivery is k-synchronous iff every cycle 
in its conflict-graph is good and of size at most k. 


Theorem 1 can be used to define a runtime monitoring algorithm for k- 
synchronizability checking. The monitor records the conflict-graph of the trace 
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produced by the system and checks whether it contains some bad cycle, or a cycle 
of size bigger than k. While this approach requires dealing with unbounded mes- 
sage buffers, the next section shows that this is not necessary. Synchronizability 
violations, if any, can be exposed by executing the system under the synchronous 
semantics. 


6 Checking Synchronizability 


We show that checking k-synchronizability can be reduced to a reachability prob- 
lem under the k-synchronous semantics (where message buffers are bounded). 
This reduction holds for arbitrary, possibly infinite-state, systems. More pre- 
cisely, since the set of (asynchronous) executions of a system is prefix-closed, 
if a system S admits a synchronizability violation, then it also admits a bor- 
derline violation, for which every strict prefix is synchronizable. We show that 
every borderline violation can be “simulated”? by the synchronous semantics 
of an instrumentation of S where the receipt of exactly one message is delayed 
(during every execution). We describe a monitor that observes executions of the 
instrumentation (under the synchronous semantics) and identifies synchroniz- 
ability violations (there exists a run of this monitor that goes to an error state 
whenever such a violation exists). 


6.1 Borderline Synchronizability Violations 


For a system S, a violation e to k-synchronizability is called borderline when 
every strict prefix of e is k-synchronizable. Figure 9(a) gives an example of a bor- 
derline violation to 1-synchronizability (it is the same execution as in Fig. 4(b)). 

We show that every borderline violation e ends with a receive action and this 
action is included in every cycle of CG;,,¢) that is bad or exceeds the bound k. 
Given a cycle c= v,U1,...,Un,v of a conflict graph CG;, the node v is called a 
critical node of c when (v, v1) is an SX edge with X € {S, R} and (vn, v) is an 
YR edge with Y € {S, R}. 


Lemma 1. Lete be a borderline violation to k-synchronizability of a system S. 
Then, e = e' -r for some e’ € (Sia U Ria)* and r € Ria. Moreover, the node v of 
CGir(e) representing r (and the corresponding send) is a critical node of every 
cycle of CGip(e) which is bad or of size bigger than k. 


6.2 Simulating Borderline Violations on the Synchronous Semantics 


Let S’ be a system obtained from S by “delaying” the reception of exactly one 
nondeterministically chosen message: S’ contains an additional process 7 and 
exactly one message sent by a process in S is non-deterministically redirected 


3 We refer to the standard notion of (stuttering) simulation where one system mimics 
the transitions of the other system. 
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Elevator Door Elevator Door 


send;(d,e,doorStopped) send; (d,e,doorStopped) 
reca(e,doorStopped) aaa reca(e,doorStopped) t 


send,(e,d,open) sende(e,d,open) [il sends(d, 7; (e,doorOpened) ) 
senda(d,e,doorOpened) 
A Press T 


d, d, 
Foead open) recaldopan) reca(7, (e, doorOpened) ) 
senda(T,e,doorOpened) 


(a) (b) 


synchronizable 


reca(e,doorOpened) teca(e,doorOpened) [i] 


Fig. 9. A borderline violation to 1-synchronizability. 


to qt, which sends it to the original destination at a later time®. We show that 
the synchronous semantics of S’ “simulates” a permutation of every borderline 
violation of S. Figure 9(b) shows the synchronous execution of S’ that corre- 
sponds to the borderline violation in Fig. 9(a). It is essentially the same except 
for delaying the reception of doorOpened by sending it to 7 who relays it to the 
elevator at a later time. 

The following result shows that the k-synchronous semantics of S’ “simu- 
lates” all the borderline violations of S, modulo permutations. 


Lemma 2. Let e = e1 - send;(p, q, V) - e2- rec;(q,v) be a borderline violation to 
k-synchronizability of S. Then, sEx,(S’) contains an execution e' of the form: 


e’ = e} -send,(p, 7, (q, v)) - rec;(m, (q, v)) : e3 «send; (T, q, v) - rec;(q, v) 
such that e} -send;(p,q,v)-e5 is a permutation of e - send;(p, q, v) « e2. 


Checking k-synchronizability for S on the system S’ would require that 
every (synchronous) execution of S’ can be transformed to an execution of S 
by applying an homomorphism o where the send/receive pair with destination 
m is replaced with the original send action and the send/receive pair initiated 
by a is replaced with the original receive action (all the other actions are left 
unchanged). However, this is not true in general. For instance, S’ may admit an 
execution send, (p, 7, (q, v))-rec;(7, (q, v))-send;(p, q, v’)-rec, (q, v’)-send,(7, q, v) 
rec; (q, v) where a message sent after the one redirected to 7 is received earlier, 
and the two messages were sent by the same process p. This execution is possible 
under the 1-synchronous semantics of S’. Applying the homomorphism g, we get 
the execution send,(p, q, v) - send; (p,q, v’) - rec;(q, v") - reci(q,v) which violates 
causal delivery and therefore, it is not admitted by the asynchronous semantics 


4 Meaning that every transition labeled by a send action send(p,q,v) is doubled by 
a transition labeled by send(p, 7, (q, v)), and such a send to m is enabled only once 
throughout the entire execution. 

5 The process m stores the message (q, v) it receives in its state and has one transition 
where it can send v to the original destination q. 


On the Completeness of Verifying Message Passing Programs 385 


of S. Our solution to this problem is to define a monitor M causal, 1-€., & process 
which reads every transition label in the execution and advances its local state, 
which excludes such executions of S’ when run under the synchronous semantics, 
i.e., it blocks the system S’ whenever applying some transition would lead to an 
execution which, modulo the homomorphism øg, is a violation of causal delivery. 
This monitor is based on the same principles that we used to exclude violations 
of causal delivery in the synchronous semantics in the presence of unmatched 
sends (the component B from a synchronous configuration). 


6.3 Detecting Synchronizability Violations 


We complete the reduction of checking k-synchronizability to a reachability prob- 
lem under the k-synchronous semantics by describing a monitor M yioi(k), which 
observes executions in the k-synchronous semantics of S’ || M causal and checks 
whether they represent violations to k-synchronizability; M yjo1(k) goes to an 
error state whenever such a violation exists. 

Essentially, M yioi(k) observes the sequence of k-exchanges in an execu- 
tion and tracks a conflict graph cycle, if any, interpreting send,(p, 7, (q,v)) - 
rec;(7, (q, v)) as in the original system S, i.e., as send;(p, q, v), and send;(7, q, v)- 
rec;(q,v) as rec;(q,v). By Lemma 2, every cycle that is a witness for non k- 
synchronizability includes the node representing the pair send;(p, q, v), reci(q, v). 
Moreover, the successor of this node in the cycle represents an action that is exe- 
cuted by p and the predecessor an action executed by q. Therefore, the monitor 
searches for a conflict-graph path from a node representing an action of p to a 
node representing an action of g. Whenever it finds such a path it goes to an 
error state. 

Figure 10 lists the definition of Myio1(k) as an abstract state machine. By 
the construction of S’, we assume w.l.o.g., that both the send to 7 and the send 
from 7 are executed in isolation as an instance of l-exchange. When observing 
the send to 7, the monitor updates the variable conflict, which in general 
stores the process executing the last action in the cycle, to p. Also, a variable 
count, which becomes 0 when the cycle has strictly more than k nodes, is ini- 
tialized to k. Then, for every k-exchange transition in the execution, M yioi(k) 
non-deterministically picks pairs of matching send/receive or unmatched sends 
to continue the conflict-graph path, knowing that the last node represents an 
action of the process stored in conflict. The rules for choosing pairs of match- 
ing send/receive to advance the conflict-graph path are pictured on the right 
of Fig. 10 (advancing the conflict-graph path with an unmatched send doesn’t 
modify the value of conflict, it just decrements the value of count). There are 
two cases depending on whether the last node in the path conflicts with the send 
or the receive of the considered pair. One of the two processes involved in this 
pair of send/receive equals the current value of conflict. Therefore, conflict 
can either remain unchanged or change to the value of the other process. The 
variable lastIsRec records whether the current conflict-graph path ends in a 
conflict due to a receive action. If it is the case, and the next conflict is between 
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function conflict: PU{1} 
function lastIsRec: B 
function sawRS: B 


function count: N Process p Process q 
a 
rule send;(p, 7, (q, v)) - recs (7, (q, v)): i ee 
conflict := p r E E CONTIGE SB. 
count := k 
// for every i, dest(si) Ær and proc(si) #7 Tetite medl 
rule 81-...*8n°T1*..6*Tm! k-exchange conflict := p A lastlsRec := false 
P conflict = qA lastlsRec :=t 
if (*Adj. 8; Hr; A conflict € {proc(s;), dest(s;)}) scala eae 
if ( * ) 
conflict = proc(s:) Processp Process q 
if (lastIsRec) sawRS := true : : 
lastIsRec := false i a 
else F i conflict = q 
conflict := dest(s;) j 
lastIsRec := true 
count -- the current Bans i 
if ( * A proc(s;) = conflict A Yj. =s; H rj ) k-exchange comer := p A lastlsRec := false 
count ~~ conflict := q A lastlsRec := true 


lastIsRec := false 
rule send;(7,q,v) - rec;(q,v): 
assert conflict = q = > (count > 0 A —sawRS) 


Fig. 10. The monitor M,j:(k). B is the set of Booleans and N is the set of natural 
numbers. Initially, conflict is L, while lastIsRec and sawRS are false. 


this receive and a send, then sawRS is set to true to record the fact that the 
path contains an RS labeled edge (leading to a potential bad cycle). 

When 7 sends its message to q, the monitor checks whether the conflict-graph 
path it discovered ends in a node representing an action of q. If this is the case, 
this path together with the node representing the delayed send forms a cycle. 
Then, if sawRS is true, then the cycle is bad and if count reached the value 0, 
then the cycle contains more than k nodes. In both cases, the current execution 
is a violation to k-synchronizability. 

The set of executions in the k-synchronous semantics of S’ composed with 
M causal and M yioi(k), in which the latter goes to an error state, is denoted by 
Si, || M causal || aM vioi(k). 


Theorem 2. For a given k, a system S is k-synchronizable iff the set of execu- 
tions Si, || M causal || IM vioi(k) is empty. 


Given a system S, an integer k, and a local state l, the reachability problem 
under the k-synchronous semantics asks whether there exists a k-synchronous 
execution of S reaching a configuration (l, B) with | = lp for some p € P. Theo- 
rem 2 shows that checking k-synchronizability can be reduced to a reachability 
problem under the k-synchronous semantics. This reduction holds for arbitrary 
(infinite-state) systems, which implies that k-synchronizability can be checked 
using the existing assertion checking technology. Moreover, for finite-state sys- 
tems, where each process has a finite number of local states (message buffers can 
still be unbounded), it implies that checking this property is PSPACE-complete. 
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Theorem 3. For a finite-state system S, the reachability problem under the k- 
synchronous semantics and the problem of checking k-synchronizability of S are 
decidable and PSPACE-complete. 


7 Experimental Evaluation 


As a proof of concept, we have applied our 


Name Proc|Loc|k|Time 
Elevator 3 [90 [2[64.3s procedure for checking k-synchronizability 
SR í os 1 Lass to a set of examples extracted from the 
Two-phase commit) 4] 57 |1]1.43s distribution of the P language®. Two-phase 
Replication storage] 6 | 100/4]92.8s commit and Elevator are presented in 
Sect.2, German is a model of the cache- 
Fig. 11. Experimental results. coherence protocol with the same name, 


OSR is a model of a device driver, and 
Replication Storage is a model of a protocol ensuring eventual consistency of a 
replicated register. These examples cover common message communication pat- 
terns that occur in different domains: distributed systems (Two-phase commit, 
Replication storage), device drivers (Elevator, OSR), cache-coherence protocols 
(German). We have rewritten these examples in the Promela language and used 
the Spin model checker’ for discharging the reachability queries. For a given 
program, its k-synchronous semantics and the monitors defined in Sect.6 are 
implemented as ghost code. Finding a conflict-graph cycle which witnesses non 
k-synchronizability corresponds to violating an assertion. 

The experimental data is listed in Fig. 11: Proc, resp., Loc, is the number of 
processes, resp., the number of lines of code (loc) of the original program, k is 
the minimal integer for which the program is k-synchronizable, and Time gives 
the number of minutes needed for this check. The ghost code required to check 
k-synchronizability takes 250 lines of code in average. 


8 Related Work 


Automatic verification of asynchronous message passing systems is undecidable 
in general [10]. A number of decidable subclasses has been proposed. The class 
of systems, called synchronizable as well, in [4], requires that a system gener- 
ates the same sequence of send actions when executed under the asynchronous 
semantics as when executed under a synchronous semantics based on rendezvous 
communication. These systems are all 1-synchronizable, but the inclusion is 
strict (the 1-synchronous semantics allows unmatched sends). The techniques 
proposed in [4] to check that a system is synchronizable according to their defi- 
nition cannot be extended to k-synchronizable systems. Other classes of systems 
that are 1-synchronizable have been proposed in the context of session types, 


6 Available at https://github.com/p-org. 
T Available at http://spinroot.com. 
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e.g., [12,20,21,26]. A sound but incomplete proof method for distributed algo- 
rithms that is based on a similar idea of avoiding reasoning about all program 
computations is introduced in [3]. Our class of synchronizable systems differs also 
from classes of communicating systems that restrict the type of communication, 
e.g., lossy-communication [2], half-duplex communication [11], or the topology of 
the interaction, e.g., tree-based communication in concurrent pushdowns [19, 23]. 

The question of deciding if all computations of a communicating system 
are equivalent (in the language theoretic sense) to computations with bounded 
buffers has been studied in, e.g., [17], where this problem is proved to be unde- 
cidable. The link between that problem and our synchronizability problem is not 
(yet) clear, mainly because non synchronizable computations may use bounded 
buffers. 

Our work proposes a solution to the question of defining adequate (in terms 
of coverage and complexity) parametrized bounded analyses for message pass- 
ing programs, providing the analogous of concepts such as context-bounding 
or delay-bounding defined for shared-memory concurrent programs. Bounded 
analyses for concurrent systems was initiated by the work on bounded-context 
switch analysis [25,28,29]. For shared-memory programs, this work has been 
extended to unbounded threads or larger classes of behaviors, e.g., [8, 15, 22,24]. 
Few bounded analyses incomparable to ours have been proposed for message 
passing systems, e.g., [6,23]. Contrary to our work, these works on bounded 
analyses in general do not propose decision procedures for checking if the anal- 
ysis is complete (covers all reachable states). The only exception is [24], which 
concerns shared-memory. 

Partial-order reduction techniques, e.g., [1,16], allow to define equivalence 
classes on behaviors, based on notions of action independence and explore (ide- 
ally) only one representative of each class. This has lead to efficient algorithmic 
techniques for enhanced model-checking of concurrent shared-memory programs 
that consider only a subset of relevant action interleavings. In the worst case, 
these techniques will still need to explore all of the interleavings. Moreover, these 
techniques are not guaranteed to terminate when the buffers are unbounded. 

The work in [13] defines a particular class of schedulers, that roughly, pri- 
oritize receive actions over send actions, which is complete in the sense that it 
allows to construct the whole set of reachable states. Defining an analysis based 
on this class of schedulers has the same drawback as partial-order reductions, 
in the worst case, it needs to explore all interleavings, and termination is not 
guaranteed. 

The approach in this work is related to robustness checking [5,7]. The gen- 
eral paradigm is to decide that a program has the same behaviors under two 
semantics, one being weaker than the other, by showing a polynomial reduction 
to a state reachability problem under the stronger semantics. For instance, in 
our case, the class of message passing programs with unbounded FIFO channels 
is Turing powerful, but still, surprisingly, k-synchronizability of these programs 
is decidable and PSPACE-complete. The results in [5,7] cannot be applied in 
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our context: the class of programs and their semantics are different, and the cor- 
responding robustness checking algorithms are based on distinct concepts and 
techniques. 
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Abstract. The cornerstone of dynamic partial order reduction (DPOR) 
is the notion of independence that is used to decide whether each pair 
of concurrent events p and t are in a race and thus both p-t and t- p 
must be explored. We present constrained dynamic partial order reduc- 
tion (CDPOR), an extension of the DPOR framework which is able to 
avoid redundant explorations based on the notion of conditional inde- 
pendence—the execution of p and t commutes only when certain inde- 
pendence constraints (ICs) are satisfied. ICs can be declared by the pro- 
grammer, but importantly, we present a novel SMT-based approach to 
automatically synthesize ICs in a static pre-analysis. A unique feature 
of our approach is that we have succeeded to exploit ICs within the 
state-of-the-art DPOR algorithm, achieving exponential reductions over 
existing implementations. 


1 Introduction 


Partial Order Reduction (POR) is based on the idea that two interleavings can 
be considered equivalent if one can be obtained from the other by swapping 
adjacent, non-conflicting independent execution steps. Such equivalence class is 
called a Mazurkiewicz trace, and POR guarantees that it is sufficient to explore 
one interleaving per equivalence class. Early POR algorithms [8,10,20] relied 
on static over-approximations to detect possible future conflicts. The Dynamic- 
POR (DPOR) algorithm, introduced by Godefroid [9] in 2005, was a break- 
through in the area because it does not need to look at the future. It keeps 
track of the independence races witnessed along its execution and uses them to 
decide the required exploration dynamically, without the need of static approx- 
imation. DPOR is nowadays considered one of the most scalable techniques for 
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software verification. The key of DPOR algorithms is in the dynamic construc- 
tion of two types of sets at each scheduling point: the sleep set that contains 
processes whose exploration has been proved to be redundant (and hence should 
not be selected), and the backtrack set that contains the processes that have 
not been proved independent with previously explored steps (and hence need to 
be explored). Source-DPOR (SDPOR) [1,2] improves the precision to compute 
backtrack sets (named source sets), proving optimality of the resulting algorithm 
for any number of processes w.r.t. an unconditional independence relation. 


Challenge. When considering (S)DPOR with unconditional independence, if a 
pair of events is not independent in all possible executions, they are treated as 
potentially dependent and their interleavings explored. Unnecessary exploration 
can be avoided using conditional independence. E.g., two processes executing 
respectively the atomic instructions if(z>0) z=x; and x=x+1; would be 
considered dependent even if z < —1—this is indeed an independence constraint 
(IC) for these two instructions. Conditional independence was early introduced 
in the context of POR [11,15]. The first algorithm that has used notions of con- 
ditional independence within the state-of-the-art DPOR algorithm is Context- 
Sensitive DPOR (CSDPOR) [3]. However, CSDPOR does not use ICs (it rather 
checks state equivalence dynamically during the exploration) and exploits con- 
ditional (context-sensitive) independence only partially to extend the sleep sets. 
Our challenge is twofold: (i) extend the DPOR framework to exploit ICs dur- 
ing the exploration in order to both reduce the backtrack sets and expand the 
sleep sets as much as possible, (ii) statically synthesize ICs in an automatic 
pre-analysis. 


Contributions. The main contributions of this work can be summarized as: 


1. We introduce sufficient conditions -that can be checked dynamically— to 
soundly exploit ICs within the DPOR framework. 

2. We extend the state-of-the-art DPOR algorithm with new forms of pruning 
(by means of expanding sleep sets and reducing backtrack sets). 

3. We present an SMT-based approach to automatically synthesize ICs for 
atomic blocks, whose applicability goes beyond the DPOR context. 

4. We experimentally show the exponential gains achieved by CDPOR on some 
typical concurrency benchmarks used in the DPOR literature before. 


2 Background 


In this section we introduce some notations, the basic notions on the POR theory 
and the state-of-the-art DPOR algorithm that we will extend in Sect. 3. 

Our work is formalized for a general model of concurrent systems, in which 
a program is composed of atomic blocks of code. An atomic block can contain 
just one (global) statement that affects the global state, a sequence of local 
statements (that only read and write the local state of the process) followed by 
a global statement, or a block of code with possibly several global statements 
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but whose execution cannot interleave with other processes because it has been 
implemented as atomic (e.g., using locks, semaphores, etc.). Each atomic block 
in the program is given a unique block identifier. We use spawn(P[ini]) to create 
a new process. Depending on the programming language, P can be the name of 
a method and [ini] initial values for the parameters, or P can be the identifier of 
the initial block to execute and [ini] the initialization instructions, etc., in every 
case with mechanisms to continue the execution from one block to the following 
one. Notice that the use of atomic blocks in our formalization generalizes the 
particular case of considering atomicity at the level of single instructions. 

As previous work on DPOR [1-3], we assume the state space does not contain 
cycles, executions have finite unbounded length and processes are deterministic 
(i.e., at a given time there is at most one event a process can execute). Let X 
be the set of states of the system. There is a unique initial state sọ € X. The 
execution of a process p is represented as a partial function execute, : Xi X 
that moves the system from one state to a subsequent state. Each application 
of the function execute, represents the execution of an atomic block of the code 
that p is running, denoted as event (or execution step) of process p. An execution 
sequence E (also called derivation) of a system is a finite sequence of events of 
its processes starting from sọ, and it is uniquely characterized by the sequence 
of processes that perform steps of E. For instance, p-q-q denotes the execution 
sequence that first performs one step in p, followed by two steps in q. We use 
e to denote the empty sequence. The state of the system after E is denoted by 
S[z]. The set of processes enabled in state s (i.e., that can perform an execution 
step from s) is denoted by enabled(s). 


2.1 Basics of Partial Order Reduction 


An event e of the form (p,i) denotes the i-th occurrence of process p in an 
execution sequence, and é denotes the process p of event e, which is extended to 
sequences of events in the natural way. We write ë to refer to the identifier of 
the atomic block of code the event e is executing. The set of events in execution 
sequence F is denoted by dom( E). We use e <p e’ to denote that event e occurs 
before event e’ in FE, s.t. <p establishes a total order between events in E£, and 
E < E’ to denote that sequence E is a prefix of sequence E’. Let domjpj(w) 
denote the set of events in execution sequence E.w that are in sequence w, i.e., 
dom(E.w)\dom(E). If w is a single process p, we use nextig)(p) to denote the 
single event in dom;,)(p). If P is a set of processes, next;g)(P) denotes the set of 
nextp)(p) for all p € P. The core concept in POR is that of the happens-before 
partial order among the events in execution sequence FE, denoted by —g. This 
relation defines a subset of the <p total order, such that any two sequences with 
the same happens-before order are equivalent. Any linearization E’ of >p on 
dom(£) is an execution sequence with exactly the same happens-before relation 
>p as >p. Thus, > p induces a set of equivalent execution sequences, all with 
the same happens-before relation. We use E ~ F’ to denote that E and F’ are 
linearizations of the same happens-before relation. The happens-before partial 
order has traditionally been defined in terms of a dependency relation between 
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Algorithm 1. (Source+Context-sensitive)+Constrained DPOR algorithm 


1: procedure EXPLORE(£) 

if (Gp € (enabled(s,z))\sleep(£))) then 

3 back(E) := {p}; 

4 while (Sp € (back(E)\sleep(E))) do 

5 let n = nestr (p); 

6: for all (e € dom(E) such that e {g.p n) do 
ve 

8 


let E’ = pre(E, e); 
: let u = dep( E, e,n); 
9: if (=(U= (Ien, €, n, Sir .a])) then 


10: updateBack(E, E',e, p); 

11: if C(s[z’.a)) for some C € Ien then 

12: add û.p.ê to sleep(E’); 

13: else 

14: updateSleepC'S(E, E',e, p); 

15: sleep(E.p):= {x | x € sleep(E), E = pox} 

16: U {z | p.x € sleep(E)} 

Ir: U {x | x€ sleep(E), e| = 1, m= nettle), Usam t msp) h 
18: EXPLORE(E.p); 

19: sleep(E) := sleep(E) U {p}; 


the execution steps associated to those events [10]. Intuitively, two steps p and 
q are dependent if there is at least one execution sequence E for which they 
do not commute, either because (i) one enables the other (i.e., the execution 
of p leads to introducing q, or viceversa), or because (ii) s[z.p.q) £ S[B.¢.p|- We 
define dep(E,e,n) as the subsequence containing all events e’ in E that occur 
after e and happen-before n in E.p (i.e., e<ge’ and e’— ppn). The unconditional 
dependency relation is used for defining the concept of a race between two events. 
Event e is said to be in race with event e’ in execution EF, if the events belong to 
different processes, e happens-before e’ in E (e >p e’), and the two events are 
“concurrent”, i.e. there exists an equivalent execution sequence E’ ~ E where 
the two events are adjacent. We write e {p e’ to denote that e is in race with 
e’ and that the race can be reversed (i.e., the events can be executed in reverse 
order). POR algorithms use this relation to reduce the number of equivalent 
execution sequences explored, with SDPOR ensuring that only one execution 
sequence in each equivalence class is explored. 


2.2 State-of-the-Art DPOR with Unconditional Independence 


Algorithm 1 shows the state-of-the-art DPOR algorithm —based on the SDPOR 
algorithm of [1,2], which in turn is based on the original DPOR algorithm 
of [9]. We refer to this algorithm as DPOR in what follows. The context-sensitive 
extension of CSDPOR [3] (lines 14 and 16) and our extension highlighted in blue 


1 The extension to support wake-up trees [2] is deliberately not included to simplify 
the presentation. 
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(lines 8-10, 11-13 and 17) should be ignored by now and will be described in 
Sect. 3. 

The algorithm carries out a depth-first exploration of the execution tree 
using POR receiving as parameter a derivation FE (initially empty). Essentially, 
it dynamically finds reversible races and is able to backtrack at the appropriate 
scheduling points to reverse them. For this purpose, it keeps two sets at every 
prefix E” of E: back(E") with the set of processes that must be explored from F’, 
and, sleep(E’) with the set of sequences of processes that previous executions 
have determined do not need to be explored from F’. Note that in the original 
DPOR the sleep set contained only single processes, but in later improvements 
sequences of processes are added, so our description considers this general case. 
The algorithm starts by selecting any process p that is enabled by the state 
reached after executing Æ and is not already in sleep(F). If it does not find 
any such process p, it stops. Otherwise, after setting back(E) = {p} to start 
the search, it explores every element in back(F) that is not in sleep(E£). The 
backtrack set of E might grow as the loop progresses (due to later executions of 
line 10). For each such p, DPOR performs two phases: race detection (lines 6, 7 
and 10) and state exploration (lines 15, 18 and 19). The race detection starts by 
finding all events e in dom(E£) such that e Xz, n, where n is the event being 
selected (see line 5). For each such e, it sets E’ to pre(F,e), i.e., to be the pre- 
fix of E up to, but not including e. Procedure updateBack modifies back(E’) 
in order to ensure that the race between e and n is reversed. The source-set 
extension of [1,2] detects cases where there is no need to modify back(E’) this 
is done within procedure updateBack whose code is not shown because it is 
not affected by our extension. After this, the algorithm continues with the state 
exploration phase for E.p, by retaining in its sleep set any element x in sleep( E) 
whose events in E.p are independent of the next event of p in E (denoted as 
E |; po), i.e., any z such that next;g)(p) would not happen-before any event in 
dom(E.p.x)\dom(E.p). Then, the algorithm explores E.p, and finally it adds p to 
sleep(F) to ensure that, when backtracking on EF, p is not selected until a depen- 
dent event with it is selected. All versions of the DPOR algorithm (except [3]) rely 
on the unconditional (or context-insensitive) dependency relation. This relation 
has to be over-approximated, usually by requiring that global variables accessed 
by one execution step are not modified by the other. 


Example 1. Consider the example in Fig. 1 with 3 processes p, q, r containing a 
single atomic block. Since all processes have a single event, by abuse of notation, 
we refer to events by their process name throughout all examples in the paper. 
Relying on the usual over-approximation of dependency all three pairs of events 
are dependent. Therefore, starting with one instance per process, the algorithm 
has to explore 6 execution sequences, each with a different happens-before rela- 
tion. The tree, including the dotted and dashed fragments, shows the exploration 
from the initial state z = —2, x = —2. The value of variable z is shown in brack- 
ets at each state. Essentially, in all states of the form E.e, the algorithm always 
finds a reversible race between the next event of the current selected process 
(p, q or r) and e, and adds it to back(E). Also, when backtracking on Æ, none 
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of the elements in sleep(E) is propagated down, since all events are considered 
dependent. In the best case, considering an exact (yet unconditional) depen- 
dency relation which realizes that events p and r are independent, the algorithm 
will make the following reductions. In state 6, p and r will not be in race and 
hence p will not be added to back(q). This avoids exploring the sequence p.r 
from 5. When backtracking on state 0 with r, where sleep(e) = {p,q}, p will be 
propagated down to sleep(r) since € E r © p, hence avoiding the exploration of 
p.q from 8. Thus, the algorithm will explore 4 sequences. 


Fig. 1. Left: code of working example (up) and ICs (down). Right: execution tree 
starting from z = —2, x = —2. Full tree computed by SDPOR, dotted fragment not 
computed by CSDPOR, and, dashed+dotted fragment not computed by CDPOR. 


3 DPOR with Conditional Independence 


Our aim in CDPOR is twofold: (1) provide techniques to both infer and soundly 
check conditional independence, and (2) be able to exploit them at all points 
of the DPOR algorithm where dependencies are used. Section 3.1 reviews the 
notions of conditional independence and ICs, and introduces a first type of check 
where ICs can be directly used in the DPOR algorithm. Section 3.2 illustrates 
why ICs cannot be used at the remaining independence check points in the 
algorithm, and introduces sufficient conditions to soundly exploit them at those 
points. Finally, Sect.3.3 presents the CDPOR algorithm that includes all types 
of checks. 


3.1 Using Precomputed ICs Directly Within DPOR 
Conditional independence consists in checking independence at the given state. 


Definition 1 (conditional independence). Two events a and 6 are inde- 
pendent in state S, written indep(a, 3, S) if (i1) none of them enables the other 


from S; and, (i2) if they are both enabled in S, then S ZÊ S and S ÊS 8". 
y 
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The use of conditional independence in the POR theory was firstly studied in [15], 
and it has been partially applied within the DPOR algorithm in CSDPOR [3]. 
Function updateSleepCsS at line 14 and the modification of sleep at 16 encapsulate 
this partial application of CSDPOR (the code of updateSleepCS is not shown 
because it is not affected by our extension). Intuitively, updateSleepCS works as 
follows: when a reversible race is found in the current sequence being explored, 
it builds an alternative sequence which corresponds to the reverse race, and then 
checks whether the states reached after running the two sequences are the same. 
If they are, it adds the alternative sequence to the corresponding sleep set so 
that this sequence is not fully explored when backtracking. Therefore, sleep sets 
can contain sequences of events which can be propagated down via the rule of 
line 16 (i.e., if the event being explored is the head of a sequence in the sleep 
set, then the tail of the sequence is propagated down). In essence, the technique 
to check (#2) in Definition 1 in CSDPOR consists in checking state equivalence 
with an alternative sequence in the current state (hence it is conditional) and, if 
the check succeeds, it is exploited in the sleep set only (and not in the backtrack 
set). 


Example 2. Let us explain the intuition behind the reductions that CSDPOR 
is able to achieve w.r.t. unconditional independence-based DPOR on the exam- 
ple. In state 1, when the algorithm selects q and detects the reversible race 
between q and p, it computes the alternative sequence q.p and realizes that 
S[p.q] = S{q.p]> and hence adds p.q to sleep(e). Similarly, in state 2, it computes 
p.r.q and realizes that sjp¢.7) = S{p.r.q) adding r.q to sleep(p). Besides these 
two alternative sequences, it computes two more. Overall, CSDPOR explores 2 
complete sequences (p.g.r and q.r.p) and 13 states (the 9 states shown, plus 4 
additional states to compute the alternative sequences). 


Instead of computing state equivalence to check (i2) as in [3], our approach 
assumes precomputed independence constraints (ICs) for all pairs of atomic 
blocks in the program. ICs will be evaluated at the appropriate state to deter- 
mine the independence between pairs of concurrent events executing such atomic 
blocks. 


Definition 2 (ICs). Consider two events a and B that execute, respectively, the 
atomic blocks & and B. The independence constraints Ia g are a set of boolean 
expressions (constraints) on the variables accessed by a and B (including local 
and global variables) s.t., if some constraint C in Ia g holds in state S, written 
C(S), then condition (i2) of indep(a, 6, S) holds. 


Our first contribution is in lines 11-13 where ICs are used within DPOR as 
follows. Before executing updateSleepC'S at line 14, we check if some constraint 
in Ien holds in the state simaj, by building the sequence E’.i, where u = 
dep(E,e,n). Only if our check fails we proceed to execute updateSleepCS. The 
advantages of our check w.r.t. updateSleepCS are: (1) the alternative execution 
sequence built by updateSleepCS is strictly longer than ours and hence more 
states will be explored, and (2) updateSleepCS must check state equivalence 
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while we evaluate boolean expressions. Yet, because our IC is an approximation, 
if we fail to prove independence we can still use updateSleepCS. 


Example 3. Consider the ICs in Fig. 1 (down left), which provide the constraints 
ensuring the independence of each pair of atomic blocks, and whose synthesis 
is explained in Sect. 4.1. In the exploration of the example, when the algorithm 
detects the reversible race between q and p in state 1, instead of computing 

q.p and then comparing 81», a] = S[q.p) a8 in CSDPOR, we would just check the 
constraint in Tpz at state €, i.e., in z = —2 (line 11), and since it succeeds, q.p is 
added to sleep(e). The same happens at states 2, again at 1 (when backtracking 
with r), and 5. This way we avoid the RATEN of the additional 4 states due 
to the computation of the alternative sequences in Example 2 (namely q.p, r.p 
and r.q from state 0, and r.q from 1). The algorithm is however still exploring 
many redundant derivations, namely states 4, 5, 6, 7 and 8. 


3.2 Transitive Uniformity: How to Further Exploit ICs Within 
DPOR 


The challenge now is to use ICs, and therefore conditional independence, at 
the remaining dependency checks performed by the DPOR algorithm, and most 
importantly, for the race detection (line 6). In the example, that would avoid 
the addition of q and r to back(e) and r to back(p), and hence would make the 
algorithm only explore the sequence p.q.r. Although that can be done in our 
example, it is unsound in general as the following counter-example illustrates. 


Example 4. Consider the same example but starting from the initial state z = 
—1, x = —2. During the exploration of the first sequence p.q.r, the algorithm 
will not find any race since p and q are independent in z = —1, q and r are 
independent in z = x = —1, and, p and r are always independent. Therefore, 
no more sequences than p.q.r with final result z = 0 will be explored. There is 
however a non-equivalent sequence, r.g.p, which leads to a different final state 
z=-l. 


The problem of using conditional independence within the POR theory was 
already identified by Katz and Peled [15]. Essentially, the main idea of POR 
is that the different linearizations of a partial order yield equivalent executions 
that can be obtained by swapping adjacent independent events. However, this 
is no longer true with conditional dependency. In Example 4, using conditional 
independence, the partial order of the explored derivation p.q.r would be empty, 
which means there would be 6 possible linearizations. However r.q.p is not equiv- 
alent to p.q.r since q and p are dependent in sir], i.e., when z = 0. An extra 
condition, called uniformity, is proposed in [15] to allow using conditional inde- 
pendence within the POR theory. Intuitively, uniform independence adds a con- 
dition to Definition 1 to ensure that independence holds at all successor states 
for those events that are enabled and are uniformly independent with the two 
events whose independence is being proved. While this notion can be checked 
a posteriori in a given exploration, it is unclear how it could be applied in a 
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dynamic setting where decisions are made a priori. Here we propose a weaker 
notion of uniformity, called transitive uniformity, for which we have been able 
to prove that the dynamic-POR framework is sound. The difference with [15] is 
that our extra condition ensures that independence holds at all successor states 
for all events that are enabled, which is thus a superset of the events considered 
in [15]. We notice that the general happens-before definition of [1,2] does not 
capture our transitive uniform conditional independence below (namely prop- 
erty seven of [1,2] does not hold), hence CDPOR cannot be seen as an instance 
of SDPOR but rather as an extension. 


Definition 3. The transitive uniform conditional independence relation, writ- 
ten unifa, 8,5), fulfills (i1) and (i2) and, (i3) unifa, ß, Sy) holds for all 
y ¢ {a, 8} enabled in S, where S is defined by S ma Sy. 


During the exploration of the sequence p.q.r in Example 4, the algorithm will now 
find a reversible race between p and q, since the independence is not transitively 


uniform in z = —1,x = —2. Namely, (i3) does not hold since r is enabled and 
we have z = —1 and z = 0 in si}, which implies sunif(p, q, Sjr]) ((i2) does not 
hold). 


We now introduce sufficient conditions for transitive uniformity that can 
be precomputed statically, and efficiently checked, in our dynamic algorithm. 
Condition (i1) is computed dynamically as usual during the exploration sim- 
ply storing enabling dependencies. Condition (i2) is provided by the ICs. Our 
sufficient conditions to ensure (i9) are as follows. For each atomic block b, we 
precompute statically (before executing DPOR) the set W (b) of the global vari- 
ables that can be modified by the full execution of b, i.e., by an instruction in b 
or by any other block called from, or enabled by, b (transitively). To this end, we 
do a simple analysis which consists in: (1) First we build the call graph for the 
program to establish the calling relationships between the blocks in the program. 
Note that when we find a process creation instruction spawn(P[ini]) we have a 
calling relationship between the block in which the spawn instruction appears 
and P. (2) We obtain (by a fixed point computation) the largest relation ful- 
filling that g belongs to W (b) if either g is modified by an instruction in b or g 
belongs to W (c) for some block c called from b. This computation can be done 
with different levels of precision, and it is well-studied in the static analysis field 
[18]. We let G(C) be the set of global variables evaluated on constraint C in I. 


Definition 4 (sufficient condition for transitive uniformity, U»). Let E 
be a sequence, I a set of constraints, a and B be two events enabled in sp), 
and T = nextip)(enabled(siz))) \ {a, 8}, we define U> (I,a, 8, sm) =IC eT: 
C(sz)) A (GCC) NA User W()) = 9). 

Intuitively, our sufficient condition ensures transitive uniformity by checking that 
the global variables involved in the constraint C of the IC used to ensure the 
uniformity condition are not modified by other enabled events in the state. 


Theorem 1. Given a sequence E and two events a and 8 enabled in sz), we 
have that U> (Ix 5, a, b, ste) > unif(a, B, ste). 
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3.3 The Constrained DPOR Algorithm 


The code highlighted in blue in Algorithm 1 provides the extension to apply 
conditional independence within DPOR. In addition to the pruning explained in 
Sect. 3.1, it achieves two further types of pruning: 


1. Back-set reduction. The race detection is strengthened with an extra condition 
(line 9) so that e and n (the next event of p) are in race only if they are 
not conditionally independent in state s;g/,) (using our sufficient condition 
above). Here u is the sub-sequence of events of E that occur after e and 
“happen-before” n. This way the conditional independence is evaluated in 
the state after the shortest subsequence so that the events are adjacent in an 
equivalent execution sequence. 

2. Sleep-set extension. An extra condition to propagate down elements in the 
sleep set is added (line 17) s.t. a sequence x, with just one process, is propa- 
gated if its corresponding event is conditionally independent of n in sjp). 


It is important to note also that the inferred conditional independencies are 
recorded in the happens-before relation to be later re-used for subsequent com- 
putations of the < and dep definitions. 


Example 5. Let us describe the exploration for the example in Fig.1 using 
our CDPOR. At state 1, the algorithm checks whether p and q are in race. 
U- (Ip, p,q, S) does not hold in z = —2 since, although (z < —1) € Iq holds, 
we have that G(z < —1)A W (r) = {z} #9. Process q is hence added to back(e). 
On the other hand, since (z < —1) € Ip,g holds in z = —2 (line 11), q.p is added 
to sleep(e) (line 12). At state 2 the algorithm checks the possible race between 
q and r after executing p. This time the transitive uniformity of the indepen- 
dence of q and r holds since (z < —2) € Iz, holds, and there are no enabled 
events out of {q,r}. Our algorithm therefore avoids the addition of r to back(p) 
(pruning 1 above). The algorithm also checks the possible race between p and r 
in z = —2. Again, true € Ip, holds and is uniform since G(true) = 0 (pruning 
1). The algorithm finishes the exploration of sequence p.q.r and then backtracks 
with q at state 0. At state 5 the algorithm selects process r (p is in the sleep 
set of 5 since it is propagated down from the q.p in sleep(e)). It then checks 
the possible race between q and r, which is again discarded (pruning 1), since 
transitive uniformity of the independence of q and r can be proved: we have that 
(z < —2) € Ig holds in z = —2 and W (p) N G(z < —2) = 0, where p is the only 
enabled event out of {q,r} and W(p) = {x}. This avoids adding r to back(e). 
Finally, at state 5, p is propagated down in the new sleep set (pruning 2), since 
as before true € Ip, ensures transitive uniformity. The exploration therefore 
finishes at state 6. 


Overall, on our working example, CDPOR has been able to explore only one 
complete sequence p.g.r and the partial sequence q.r (a total of 6 states). The 
latter one could be avoided if a more precise sufficient condition for uniformity 
is provided which, in particular, is able to detect that the independence of p and 
q in € is transitive uniform, i.e., it still holds after r (even if r writes variable z). 
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Theorem 2 (soundness). For each Mazurkiewicz trace T defined by the hap- 
pens before relation, Explore(e,@) in Algorithm 1 explores a complete execution 
sequence T” that reaches the same final state as T. 


4 Automatic Generation of ICs Using SMT 


Generating ICs amounts to proving (conditional) program equivalence w.r.t. the 
global memory. While the problem is very hard in general, proving equivalence 
of smaller blocks of code becomes more tractable. This section introduces a 
novel SMT-based approach to synthesize ICs between pairs of atomic blocks of 
code. Our ICs can be used within any transformation or analysis tool -beyond 
DPOR-} which can gain accuracy or efficiency by knowing that fragments of 
code (conditionally) commute. Section 4.1 first describes the inference for basic 
blocks; Sect. 4.2 extends it to handle process creation and Sect. 4.3 outlines other 
extensions, like loops, method invocations and data structures. 


4.1 The Basic Inference 


In this section we consider blocks of code containing conditional statements and 
assignments using linear integer arithmetic (LIA) expressions. The first step to 
carry out the inference is to transform q and r into two respective deterministic 
Transition Systems (TSs), T} and T, (note that q and r are assumed to be 
deterministic), and compose them in both reverse orders Tgr and T,.q. Consider 
r and q in Fig. 1 whose associated TSs are (primed variables represent the final 
value of the variables): 


Tg: 220732’ =; T, : true > a =ax2+1,2/=2+1; 
z<0=>7z'=z; 


The code to be analyzed is the composition of T} and T, in both orders: 


Tort Z> 0 >x =r+1,? =£+1; Tra: Z> -1 >r =xr+1,z? =x+l1; 
z2<O07¢ =241,2/=241, 2<-loew’ =241,2/ =24+1; 


In what follows we denote by Ta.» the deterministic TS obtained from the con- 
catenation of the blocks a and b, such that all variables are assigned in one 
instruction using parallel assignment. We let A |g be the restriction to the global 
memory of the assignments in A (i.e., ignoring the effect on local variables). The 
following definition provides an SMT formula over LIA (a boolean formula where 
the atoms are equalities and inequalities over linear integer arithmetic expres- 
sions) which encodes the independence between the two blocks. 


Definition 5 (IC generation). Let us consider two atomic blocks q and r and 
a global memory G and let Ci — A; (resp. C; — A’;) be the transitions in Tgr 
(resp. Trq). We obtain Fy,r as the SMT formula: V; (Ci ^C} A Ai |e= Aj la). 
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Intuitively, the SMT encoding in the above definition has as solutions all those 
states where both a condition C; of a transition in Tj., and C; of a transition in 
T,.q hold (and hence are compatible) and the final global state after executing 
all instructions in the two transitions (denoted A; and A/;) remains the same. 

Next, we generate the constraints of the independence condition Jj, by 
obtaining a compact representation of all models over linear arithmetic atoms 
(computed by an allSAT SMT solver) satisfying F,,-. In particular, we add a 
constraint in J, for every obtained model. 


Example 6. In the example, we have the TS with conditions and assignments: 


Tyr: Cuz > 0 Arve’ =a24+1,2 =241 | Trg: Ciz > —1 Ape’ =x+1,z =r+1 
Coz <0 Agix’ =x41,2/=2z41 Chiz<—-1 Abia’ =x+1,z' =z+1 


and we obtain a set with three constraints Ij, = {(z > 0), (z = x), (z < —1)} 
by computing all models satisfying the following resulting formula: 


(2>0Az>-1lArt4+1l=atlAgrstl=rzr41)v 
(z>0Az<-lAr+l=ae4+1lAr4+l=z4+l)v 
V 

) 


(z<0Az>-1^Az+1l=zr+1^Az+1=z+1 
(z<0Az<-1l^Az+1=r+1^Az+1l1=z+1 


The second conjunction is unsatisfiable since there is no model with both C1 and 
C4. On the other hand, the equalities of the first and the last conjunctions always 
hold, which give us the constraints z > 0 and z < —2. Finally, all equalities hold 
when x = z, which give us the third constraint as a result for our SMT encoding. 


Note that, as in this case Fy, describes not only a sufficient but also a necessary 
condition for independence, the obtained constraints IC are also a sufficient 
and necessary conditions for independence. This allows removing line 14 in the 
algorithm, since the context-sensitive check will fail if line 11 does. However, the 
next extensions do not ensure that the generated ICs are necessary conditions. 


4.2 IC for Blocks with Process Creation 


Consider the following two methods whose body constitutes an atomic block 
(e.g., the lock is taken at the method start and released at the return). They 
are inspired by a highly concurrent computation for the Fibonacci used in the 
experiments. Variables nr and r are global to all processes: 


fib(int v) { res(int v) { 
if (v<1) {spawn (res(v));} if (mr>0) {nr=0; r=v; } 
else {spawn (fib (v-1)); else {spawn(res(r+v)) ; 
spawn (fib(v-2) ) ; } r=0;nr=1;} 
} } 


We now want to infer Lein(v) fib(v1)> Tein) res(vi)> Tres(v),res(v1)* The first step is to 
obtain, for each block r, a TS with uninterpreted functions, denoted T'S”, in 
which transitions are of the form C — (A, S) where A are the parallel assign- 
ments as in Sect. 4.1, and S is a multiset containing calls to fresh uninterpreted 
functions associated to the processes spawned within the transition (i.e., a pro- 
cess creation spawn(P) is associated to an uninterpreted function spawn_P). 
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Tip: V < 1 > (skip, {spawn_res(v) }) 
v > 1 — (skip, {spawn_fib(v — 1), spawn_fib(v — 2)} 
Te: nr > 0 > (nr’ = 0,r' = v, {}) 
nr < 0 —> (nr' =1,r' = 0, {spawn_res(r + v)} 
The following definition extends Definition 5 to handle process creation. Intu- 
itively, it associates a fresh variable to each different element in the multisets 
(mapping P’ below) and enforces equality among the multisets. 


Definition 6 (IC generation with process creation). Let us consider T Siq 
and TS7.,. We define P = {Us| s € S, with C —> (A, S) € TSt UT Sr} 
Let P' be a mapping from the elements in P to fresh variables, and P'(S) be 
the replacement of the elements in the multiset S applying the mapping P'. Let 
Ci > (Aj, Si) (resp. C; => (Aj, Sv )) be the transitions in ieee (resp. T Spg) We 
obtain Fq, as the SMT formula: V; (Ci AC} A Ai |a= Aj le AP'(Si) = P'(S3))- 


For simplicity and efficiency, we consider that = corresponds to the syntactic 
equality of the multisets. However, in order to improve the precision of the encod- 
ing we apply P’ to S; and S; replacing two process creations by the same variable 
if they are equal modulo associativity and commutativity (AC) of arithmetic 
operators and after substituting the equalities already imposed by A; |a= Aj 
(see example below). A more precise treatment can be achieved by using equality 
with uninterpreted functions (EUF) to compare the multisets of processes. 


Example 7. Let us show how we apply the above definition to infer res) res(v1)- 
We first build Tres(v)-res(vi) from Tres) by composing it with itself: 
nr <0 (nr’ = 0,r! = v1, {spawn-_res(r+v)}) 
nr > 0 —> (nr! = 1,7’ = 0, {spawn_res(v+vz)}) 
and Tres(v;)-res(v) Which is like the one above but exchanging v and v1. Next, we 
define P’ = {spawn_res(r + v) > 21, spawn_res(v + v1) > x2, spawn_res(r + 
V1) > x3, spawn_res(v;+v) > x4} and apply it with the improvement described 
above 
(nr <O0Anr <0A0=O0Av=, A {ai} = {ai}) V 
(nr <OAnr >OAD0=1Av, =0A {a1} = {a4}) V 
(nr >OAnr <0OA1L=O0A0=VvA {ao} = {a3}) V 
(nr >OAnr >OAL=1A0=0A {ao} = {x£2}) 
Note that the second and the third conjunction are unfeasible and hence can 
be removed from the formula. In the first one spawn_res(r + v1) is replaced by 
xı (instead of x3) since we can substitute vı by v as v = vı is imposed in the 


conjunction and in the fourth one spawn_res(v1 + v) is replaced by x2 (instead 
of z4) since it is equal modulo AC to spawn_res(v + vı). Then we finally have 


(nr <OAnr <0A0=O0AvV=1) V (nr >O0Anr>0A1=1A0=0) 


As before, Ires(v),res(v,) = {(nr > 0), (v = v1)} is then obtained by computing all 
satisfying models. In the same way we obtain Ifib(v),res(v,) = Ltib(v),fib(v,) = {true}. 
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The following theorem states the soundness of the inference of ICs, that holds 
by construction of the SMT formula. 


Theorem 3 (soundness of independence conditions). Given the assump- 
tions in Definition 6, if IC € I, q s-t. C(S) holds, then S 21 S and S 8". 


We will also get a necessary condition in those instances where the use of syn- 
tactic equality modulo AC on the multisets of created processes (as described 
above) is not loosing precision. This can be checked when building the encoding. 


4.3 Other Extensions 


We abstract loops from the code of the blocks so that we can handle them 
as uninterpreted functions similarly to Definition 6. Basically, for each loop, we 
generate as many uninterpreted functions as variables it modifies (excluding 
local variables of the loop) plus one to express all processes created inside the 
loop. The functions have as arguments the variables accessed by the loop (again 
excluding local variables). This transformation allows us to represent that each 
variable might be affected by the execution of the loop over some parameters, 
and then check in the reverse trace whether we get to the loop over the same 
parameters. 


Definition 7 (loop extraction for IC generation). Let us consider a loop 
L that accesses x1,..., £n variables and modifies y1,...,Ym variables (excluding 
local loop variables) and let l1,...,ln+41 be fresh function symbol names. We 
replace L by the following code: 


if é 7 an F ms. / PN . a À 1X 2 
Ty = T1;...; Un = Tn; Y1 = li(24, eign 3) Ym = brad Ps ate) 


spawn(fm41(@4,--521,));  (onlyif there are spawn operations inside the loop) 


Existing dependency analysis can be used to infer the subset of x1,..., £n that 
affects each y;, achieving more precision with a small pre-computation overhead. 

The treatment of method invocations (or function calls) to be executed atom- 
ically within the considered blocks can be done analogously to loops by intro- 
ducing one fresh function for every (non-local) variable that is modified within 
the method call and one more for the result. The parameters of these new func- 
tions are the original ones plus one for each accessed (non-local) variable. After 
the transformations for both loops and calls described above, we have TSs with 
function calls that are treated as uninterpreted functions in a similar way to 
Definition 6. However these functions can now occur in the conditions and the 
assignments of the TS. To handle them, we use again a mapping P” to remove 
all function calls from the TS and replace them by fresh integer variables. After 
that the encoding is like in Definition6, and we obtain an SMT formula over 
LIA, which is again sent to the allSAT SMT solver. Once we have obtained 
the models we replace back the introduced fresh variables by the function calls 
using the mapping P”. Several simplifications on equalities involving function 
calls can be done before and after invoking the solver to improve the result. As a 
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final remark, data structures like lists or maps have been handled by expressing 
their uses as function calls, hence obtaining constraints that include conditions 
on them. 


5 Experiments 


In this section we report on experimental results that compare the performance of 
three DPOR algorithms: SDPOR [1,2], CSDPOR [3] and our proposal CDPOR. 
We have implemented and experimentally evaluated our method within the 
SYCO tool [3], a systematic testing tool for message-passing concurrent pro- 
grams. SYCO can be used online through its web interface available at http:// 
costa.fdi.ucm.es/syco. To generate the ICs, SYCO calls a new feature of the 
VeryMax program analyzer [6] which uses Barcelogic [5] as SMT solver. As 
benchmarks, we have borrowed the examples from [3] (available online from the 
previous url) that were used to compare SDPOR with CSDPOR. They are clas- 
sical concurrent applications: several concurrent sorting algorithms (QS, MS, 
PS), concurrent Fibonacci Fib, distributed workers Pi, a concurrent registration 
system Reg and database DBP, and a consumer producer interaction BB. These 
benchmarks feature the typical concurrent programming methodology in which 
computations are split into smaller atomic subcomputations which concurrently 
interleave their executions, and which work on the same shared data. There- 
fore, the concurrent processes are highly interfering, and both inferring ICs and 
applying DPOR algorithms on them becomes challenging. 

We have executed each benchmark with size increasing input parameters. A 
timeout of 60s is used and, when reached, we write >X to indicate that for the 
corresponding measure we encountered X units up to that point (i.e., it is at least 
X). Table 1 shows the results of the executions for 6 different inputs. Column 
Tr shows the number of traces, S the number of states that the algorithms 
explore, and T the time in sec it takes to compute them. For CDPOR, we also 
show the time T*"™ of inferring the ICs (since the inference is performed once 
for all executions, it is only shown in the first row). Times are obtained on 
an Intel(R) Core(TM) i7 CPU at 2.5GHz with 8GB of RAM (Linux Kernel 
5.4.0). Columns G* and GS show the time speedup of CDPOR over SDPOR 
and CSDPOR, respectively, computed by dividing each respective T by the time 
T of CDPOR. Column GS™ shows the time speedup over CSDPOR including 
Ts™ in the time of CDPOR. We can see from the speedups that the gains of 
CDPOR increase exponentially in all examples with the size of the input. When 
compared with CSDPOR, we achieve reductions up to 4 orders of magnitude for 
the largest inputs on which CSDPOR terminates (e.g., Pi, QS). It is important 
to highlight that the number of non-unitary sequences stored in sleep sets is 0 
in every benchmark except in BB for which it remains quite low (namely for 
BB(11) the peak is 22). 

W.r.t. SDPOR, we achieve reductions of 4 orders of magnitude even for 
smaller inputs for which SDPOR terminates (e.g., PS). Note that since most 
examples reach the timeout, the gains are at least the ones we show, thus the 
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Table 1. Experimental evaluation 


SDPOR CSDPOR CDPOR Speed-up 
Bench. Tr S T T S T Tr S T Tsmt G as qsmt 
Fib(6) 3k 26k 7.7 244 0.1 1 50 0.03 0.12 366 4 0.6 
Fib(7) >13k >160k 60.0 551 0.3 1 82 0.05 >1364 6 1.4 
Fib(8) >8k >101k 60.0 1 2k 0.7 1 134 0.12 >527 6 3.0 
Fib(9) >4k >51k 60.0 3k 2.8 1 218 0.25 >242 12 7.5 
Fib(10) >2k >27k 60.0 8k 11.5 1 354 0.69 >88 17 14.3 
Fib(14) >10 >3k 60.0 > >4k 60.0 1 3k 42.67 >2 >2 >15 
QS(10) 512 9k 2.6 4k 1.0 1 38 0.02 11.99 199 71 0.1 
QS(13) 5k 91k 29.5 29k 7.9 1 50 0.03 1474 395 0.7 
QS(15) >7k >157k 60.0 115k 42.6 1 58 0.05 >1500 1064 3.6 
QS(20) >4k >98k 60.0 >1 >148k 60.0 1 78 0.04 >1539 >1539 >5.0 
QS(25) >3k >96k 60.0 >1 >133k 60.0 1 98 0.06 >1017 >1017 + >5.0 
QS(200) >5 >2k 60.0 >1 >87k 60.0 1 798 4.45 >14 >1l4 >37 
MS(10) 628 7k 2.9 87 01 1 42 0.02 0.12 175 6 0.7 
MS(30) >6k >55k 60.0 974 1.0 1 118 0.13 >484 8 4.0 
MS(65) >2k >16k 60.0 3k 3.5 1 258 0.47 >131 8 6.1 
MS(100) >2k >15k 60.0 >1 >19k 60.0 1 398 0.97 >63 >63 >55.6 
MS(150) >2k >21k 60.0 >1 >18k 60.0 1 598 2.21 >28 >28 >26.0 
MS(220) >341 >6k 60.0 > >5k 60.0 1 878 4.49 >14 >14 >13.1 
Pi(7) 6k 49k 16.2 74 2k 04 1 23 0.02 005 1243 27 5.6 
Pi(8) >10k >105k 60.0 264 5k 1.7 1 26 0.02 >4616 128 26.9 
Pi(9) >11k >120k 60.0 2k 9k 7.0 1 29 0.02 >4000 465 108.9 
Pi(10)  >10k >128k 60.0 6k 91k 45.2 1 32 0.02 >3530 2655 683.7 
Pi(12) >9k >122k 60.0 >7k >128k 60.0 1 38 0.03 >2400 >2400 >810.9 
Pi(20) >5k >101k 60.0 >5k >115k 60.0 1 62 0.09 >723 >723 >454.6 
PS(4) 288 2k 04 2 41 01 1 16 001 0.59 72 2 0.1 
PS(5) 35k 156k 43.2 8 42 0.1 1 22 0.01 5391 5 0.1 
PS(6) >32k >141k 60.0 72 2k 0.4 1 29 0.02 >4286 28 0.7 
PS(7) >29k >130k 60.0 2k 28k 7.5 1 37 0.03 >2858 357 12.3 
PS(9) >25k >109k 60.0 >11k >165k 60.0 1 56 0.06 >1053 >1053 >92.9 
PS(11) >23k >103k 60.0 >9k >132k 60.0 1 79 0.09 >690 >690 >88.8 
DBP(5) 243 8k 2.0 133 4k 1.0 32 193 0.08 0.09 27 14 6.2 
DBP(6) 729 33k 8.2 308 11k 3.2 64 386 0.16 53 21 13.3 
DBP(7) 3k 134k 36.9 699 32k 10.8 128 771 0.33 113 33 26.2 
DBP(8) >4k >157k 60.0 2k 91k 36.1 256 2k 0.79 >77 47 41.6 
DBP(10) >6k >116k 60.0 >4k >125k 60.0 2k 7k 3.23 >19 >19 >18.2 
DBP(12) >9k >79k 60.0 >8k >111k 60.0 5k 25k 15.79 >4 >4 >38 
BB(6) 924 4k 1.3 215 2k 0.5 64 382 0.91 0.18 2 1 0.4 
BB(7) 4k 13k 4.3 580 4k 1.2 128 830 1.49 3 1 0.8 
BB(8) 13k 49k 17.2 2k 11k 3.3 256 2k 2.79 7 2 1.1 
BB(9) >41k >156k 60.0 5k 30k 9.0 512 4k 6.15 >10 2 1.5 
BB(10) >46k >176k 60.0 12k 81k 23.6 2k 9k 12.50 >5 2 1.9 
BB(11) >44k >169k 60.0 >44k >169k 60.0 3k 18k 25.74 >3 >3 >24 


concrete numbers shown should not be taken into account. In some examples 
(e.g., BB, MS), though the gains are linear for the small inputs, when the size 
of the problem increases both SDPOR and CSDPOR time out, while CDPOR 
can still handle them efficiently. 

Similar reductions are obtained for number of states explored. In this case, 
the system times out when it has memory problems, and the computation stops 
progressing (hence the number of explored states does not increase with the input 
any more). As regards the time to infer the annotations T®™, we observe that in 
most cases it is negligible compared to the exploration time of the other methods. 
QS is the only example that needs some seconds to be solved and this is due to 
the presence of several nested conditional statements combined with the use of 
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built-in functions for lists, which makes the generated SMT encoding harder for 
the solver and the subsequent simplification step. Note that the inference is a 
pre-process which does not add complexity to the actual DPOR algorithm. 


6 Related Work and Conclusions 


The notion of conditional independence in the context of POR was first intro- 
duced in [11,15]. Also [12] provides a similar strengthened dependency definition. 
CSDPOR was the first approach to exploit this notion within the state-of-the-art 
DPOR algorithm. We advance this line of research by fully integrating condi- 
tional independence within the DPOR framework by using independence con- 
straints (ICs) together with the notion of transitive uniform conditional indepen- 
dence —which ensures the ICs hold along the whole execution sequence. Both ICs 
and transitive uniformity can be approximated statically and checked dynam- 
ically, making them effectively applicable within the dynamic framework. The 
work in [14,21] generated for the first time ICs for processes with a single instruc- 
tion following some predefined patterns. This is a problem strictly simpler than 
our inference of ICs both in the type of IC generated (restricted to the patterns) 
and on the single-instruction blocks they consider. Furthermore, our approach 
using an AIISAT SMT solver is different from the CEGAR approach in [4]. The 
ICs are used in [14,21] for SMT-based bounded model checking, an approach 
to model checking fundamentally different from our stateless model checking 
setting. As a consequence ICs are used in a different way, in our case with no 
bounds on number of processes, nor derivation lengths, but requiring a unifor- 
mity condition on independence in order to ensure soundness. Maximal causality 
reduction [13] is technically quite different from CDPOR as it integrates SMT 
solving within the dynamic algorithm. 

Finally, data-centric DPOR (DCDPOR) [7] presents a new DPOR algorithm 
based on a different notion of dependency according to which the equivalence 
classes of derivations are based on the pairs read-write of variables. Consider the 
following three simple processes {p,q,7} and the initial state z = 0: 

p: write(x=5), q: write(x=5), r: read(x). In DCDPOR, we have only 
three different observation functions: (r,x) (reading the initial value), (r,p) 
(reading the value that p writes), (r, q) (reading the value that q writes). There- 
fore, this notion of relational independence is finer grained than the traditional 
one in DPOR. However, DCDPOR. does not consider conditional dependency, 
i.e., it does not realize that (r,p) and (r,q) are equivalent, and hence only two 
explorations are required (and explored by CDPOR). The example in conclusion, 
our approach and DCDPOR can complement each other: our approach would 
benefit from using a dependency based on the read-write pairs as proposed in 
DCDPOR, and DCDPOR would benefit from using conditional independence 
as proposed in our work. It remains as future work to study this integration. 
Related to DCDPOR, [16] extends optimal DPOR with observers. For the pre- 
vious example, [16] needs to explore five executions: r.p.q and r.q.p, are equivalent 
because p and q do not have any observer. Another improvement orthogonal to 
ours is to inspect dependencies over chains of events, as in [17,19]. 
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Abstract. Vehicle-to-Vehicle (V2V) communications is a “connected 
vehicles” standard that will likely be mandated in the U.S. within the 
coming decade. V2V, in which automobiles broadcast to one another, 
promises improved safety by providing collision warnings, but it also 
poses a security risk. At the heart of V2V is the communication messag- 
ing system, specified in SAE J2735 using the Abstract Syntax Notation 
One (ASN.1) data-description language. Motivated by numerous previ- 
ous ASN.1 related vulnerabilities, we present the formal verification of 
an ASN.1 encode/decode pair. We describe how we generate the imple- 
mentation in C using our ASN.1 compiler. We define self-consistency for 
encode/decode pairs that approximates functional correctness without 
requiring a formal specification of ASN.1. We then verify self-consistency 
and memory safety using symbolic simulation via the Software Analysis 
Workbench. 


Keywords: Automated verification + ASN.1 - Vehicle-to-Vehicle 
LLVM - Symbolic execution - SMT solver 


1 Introduction 


At one time, automobiles were mostly mechanical systems. Today, a modern 
automobile is a complex distributed computing system. A luxury car might con- 
tain tens of millions of lines of code executing on 50-70 microcontrollers, also 
known as electronic control units (ECUs). A midrange vehicle might contain at 
least 25 ECUs, and that number continues to grow. In addition, various radios 
such as Bluetooth, Wifi, and cellular provide remote interfaces to an automobile. 

With all that code and remotely-accessible interfaces, it is no surprise that 
software vulnerabilities can be exploited to gain unauthorized access to a vehi- 
cle. Indeed, in a study by Checkoway et al. on a typical midrange vehicle, for 
every remote interface, they found some software vulnerability that provided an 
attacker access to the vehicle’s internal systems [4]. Furthermore, in each case, 
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once the interface is exploited, the attackers could parlay the exploit to make 
arbitrary modifications to other ECUs in the vehicle. Such modifications could 
include disabling lane assist, locking/unlocking doors, and disabling the brakes. 
Regardless of the interface exploited, full control can be gained. 

Meanwhile, the U.S. Government is proposing a new automotive standard for 
vehicle-to-vehicle (V2V) communications. The idea is for automobiles to have 
dedicated short-range radios that broadcast a Basic Safety Message (BSM)— 
e.g., vehicle velocity, trajectory, brake status, etc.—to other nearby vehicles 
(within approximately 300m). V2V is a crash prevention technology that can 
be used to warn drivers of unsafe situations—such as a stopped vehicle in the 
roadway. Other potential warning scenarios include left-turn warnings when line- 
of-sight is blocked, blind spot/lane change warnings, and do-not-pass warnings. 
In addition to warning drivers, such messages could have even more impact for 
autonomous or vehicle-assisted driving. The U.S. Government estimates that if 
applied to the full national fleet, approximately one-half million crashes and 1,000 
deaths could be prevented annually [15]. We provide a more detailed overview 
of V2V in Sect. 2. 

While V2V communications promise to make vehicles safer, they also provide 
an additional security threat vector by introducing an additional radio and more 
software on the vehicle. 

This paper presents initial steps in ensuring that V2V communications are 
implemented securely. We mean “secure” in the sense of having no flaws that 
could be a vulnerability; confidentiality and authentication are provided in other 
software layers and are not in scope here. Specifically, we focus on the security 
of encoding and decoding the BSM. The BSM is defined using ASN.1, a data 
description language in widespread use. It is not an exaggeration to say that 
ASN.1 is the backbone of digital communications; ASN.1 is used to specify 
everything from the X.400 email protocol to voice over IP (VoIP) to cellular 
telephony. While ASN.1 is pervasive, it is a complex language that has been 
amended substantially over the past few decades. Over 100 security vulnera- 
bilities have been reported for ASN.1 implementations in MITRE’s Common 
Vulnerability Enumeration (CVE) [14]. We introduce ASN.1 and its security 
vulnerabilities in Sect. 3. 

This paper presents the first work in formally verifying a subsystem of V2V. 
Moreover, despite the pervasiveness and security-critical nature of ASN.1, it is 
the first work we are aware of in which any ASN.1 encoder (that translate ASN.1 
messages into a byte stream) and decoder (that recovers an ASN.1 message from 
a byte stream) has been formally verified. The only previous work in this direc- 
tion is by Barlas et al., who developed a translator from ASN.1 into CafeOBJ, 
an algebraic specification and verification system [1]. Their motivation was to 
allow reasoning about broader network properties, of which an ASN.1 specifica- 
tion may be one part, their work does not address ASN.1 encoding or decoding 
and appears to be preliminary. 

The encode/decode pair is first generated by Galois’ ASN.1 compiler, part of 
the High-Assurance ASN.1 Workbench (HAAW). The resulting encode/decode 
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pair is verified using Galois’ open source Software Analysis Workbench (SAW), 
a state-of-the-art symbolic analysis engine [6]. Both tools are further described 
in Sect. 4. 

In Sect.5 we state the properties verified: we introduce the notion of self- 
consistency for encode/decode verification, which approximates functional cor- 
rectness without requiring a formal specification of ASN.1 itself. Then we 
describe our approach to verifying the self consistency and memory safety of 
the C implementation of the encode/decode pair in Sect. 6 using compositional 
symbolic simulation as implemented in SAW. In Sect. 7 we put our results into 
context. 


2 Vehicle-to-Vehicle Communications 


As noted in the introduction, V2V is a short-range broadcast technology with 
the purpose of making driving safer by providing early warnings. In the V2V 
system, the BSM is the key message broadcasted, up to a frequency of 10 Hz (it 
can be perhaps lower due to congestion control). The BSM must be compatible 
between all vehicles, so it is standardized under SAE J2735 [7]. 

The BSM is divided into Part I and Part II, and both are defined with ASN.1. 
Part I is called the BSM Core Data and is part of every message broadcast. Part I 
includes positional data (latitude, longitude, and elevation), speed, heading, and 
acceleration. Additionally it includes various vehicle state information including 
transmission status (e.g., neutral, park, forward, reverse), the steering wheel 
angle, braking system status (e.g., Are the brakes applied? Are anti-lock brakes 
available/engaged?, etc.), and vehicle size. Our verification, described in Sect. 6, 
is over Part I. 

Part II is optional and extensible. Part II could include, for example, 
regionally-relevant data. It can also include additional vehicle safety data, includ- 
ing, for example, which of the vehicle’s exterior lights are on. It may include 
information about whether a vehicle is a special vehicle or performing a critical 
mission, such as a police car in an active pursuit or an ambulance with a critical 
patient. It can include weather data, and obstacle detection. 


3 ASN.1 


Abstract Syntax Notation One (ASN.1) is a standardized data description lan- 
guage in widespread usage. Our focus in this section is to give a sense of what 
ASN.1 is as well as its complexity. We particularly focus on aspects that have 
led to security vulnerabilities. 


3.1 The ASN.1 Data Description Language and Encoding Schemes 


ASN.1 was first standardized in 1984, with many revisions since. ASN.1 is a data 
description language for specifying messages; although it can express relations 
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between request and response messages, it was not designed to specify stateful 
protocols. While ASN.1 is “just” a data description language, it is quite large 
and complex. Indeed, merely parsing ASN.1 specifications is difficult. Dubuis- 
son notes that the grammar of ASN.1 (1997 standard) results in nearly 400 
shift/reduce errors and over 1,300 reduce/reduce errors in a LALR(1) parser 
generator, while a LL(k) parser generator results in over 200 production rules 
beginning with the same lexical token [8]. There is a by-hand transformation of 
the grammar into an LL(1)-compliant grammar, albeit no formal proof of their 
equivalence [9]. 

Not only is the syntax of ASN.1 complex, but so is its semantics. ASN.1 
contains a rich datatype language. There are at least 26 base types, including 
arbitrary integers, arbitrary-precision reals, and 13 kinds of string types. Com- 
pound datatypes include sum types (e.g., CHOICE and SET), records with subtyp- 
ing (e.g., SEQUENCE), and recursive types. There is a complex constraint system 
(ranges, unions, intersections, etc.) on the types. Subsequent ASN.1 revisions 
support open types (providing a sort of dynamic typing), versioning to support 
forward/backward compatibility, user-defined constraints, parameterized speci- 
fications, and so-called information objects which provide an expressive way to 
describe relations between types. 

So far, we have only highlighted the data description language itself. A set 
of encoding rules specify how the ASN.1 messages are serialized for transmission 
on the wire. Encoder and decoder pairs are always with respect to a specific 
schema and encoding rule. There are at least nine standardized ASN.1 encoding 
rules. Most rules describe 8-bit byte (octet) encodings, but three rule sets are 
dedicated to XML encoding. Common encoding rules include the Basic Encoding 
Rules (BER), Distinguished Encoding Rules (DER), and Packed Encoding Rules 
(PER). The encoding rules do not specify the transport layer protocol to use (or 
any lower-level protocols, such as the link or physical layer). 


3.2 Example ASN.1 Specification 


To get a concrete flavor of ASN.1, we present an example data schema. Let us 
assume we are defining messages that are sent (TX) and received (RX) in a 
query-response protocol. 


MsgTx ::= SEQUENCE { 
txID INTEGER(1..5), 
txTag UTF8STRING 
} 
MsgRx ::= SEQUENCE { 
rxID INTEGER(1..7), 
rxTag SEQUENCE(SIZE(0..10)) OF INTEGER 


} 


We have defined two top-level types, each a SEQUENCE type. A SEQUENCE is an 
named tuple of fields (like a C struct). The MsgTx sequence contains two fields: 
txID and txTag. These are typed with built-in ASN.1 types. In the definition 
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of MsgRx, the second field, rxTag, is the SEQUENCE OF structured type; it is 
equivalent to an array of integers that can have a length between 0 and 10, 
inclusively. Note that the txID and rxID fields are constrained integers that fall 
into the given ranges. 

ASN.1 allows us to write values of defined types. The following is a value of 
type MsgTx: 


msgTx MsgTx ::= { 
txID 1, 
txTag "Some msg" 
} 


3.3 ASN.1 Security 


There are currently over 100 vulnerabilities associated with ASN.1 in the MITRE 
Common Vulnerability Enumeration (CVE) database [14]. These vulnerabilities 
cover many vendor implementations as well as encoders and decoders embedded 
in other software libraries (e.g., OpenSSL, Firefox, Chrome, OS X, etc.). The 
vulnerabilities are often manifested as low-level programming vulnerabilities. 
A typical class of vulnerabilities are unallowed memory reads/writes, such as 
buffer overflows and over-reads and NULL-pointer dereferences. While generally 
arcane, ASN.1 was recently featured in the popular press when an ASN.1 vender 
flaw was found in telecom systems, ranging from cell tower radios to cellphone 
baseband chips [11]; an exploit could conceivably take down an entire mobile 
phone network. 

Multiple aspects of ASN.1 combine to make ASN.1 implementations a rich 
source for security vulnerabilities. One reason is that many encode/decode 
pairs are hand-written and ad-hoc. There are a few reasons for using ad-hoc 
encoders/decoders. While ASN.1 compilers exist that can generate encoders and 
decoders (we describe one in Sect. 4.1), many tools ignore portions of the ASN.1 
specification or do not support all encoding standards, given the complexity and 
breadth of the language. A particular protocol may depend on ASN.1 language 
features or encodings unsupported by most existing tools. Tools that support 
the full language are generally proprietary and expensive. Finally, generated 
encoders/decoders might be too large or incompatible with the larger system 
(e.g., a web browser), due to licensing or interface incompatibilities. 

Even if an ASN.1 compiler is used, the compiler will include significant hand- 
written libraries that deal with, e.g., serializing or deserializing base types and 
memory allocation. For example, the unaligned packed encoding rules (UPER) 
require tedious bit operations to encode types into a compact bit-vector repre- 
sentation. Indeed, the recent vulnerability discovered in telecom systems is not 
in protocol-specific generated code, but in the associated libraries [11]. 

Finally, because ASN.1 is regularly used in embedded and performance- 
critical systems, encoders/decoders are regularly written in unsafe languages, 
like C. As noted above, many of the critical security vulnerabilities in ASN.1 
encoders/decoders are memory safety vulnerabilities in C. 


418 M. Tullsen et al. 


4 Our Tools for Generating and Verifying ASN.1 Code 


We briefly introduce the two tools used in this work. First we introduce our 
ASN.1 compiler for generating the encode/decode pair, then we introduce the 
symbolic analysis engine used in the verification. 


4.1 High-Assurance ASN.1 Workbench (HAAW) 


Our High-Assurance ASN.1 Workbench (HAAW) is a suite of tools developed 
by Galois that supports each stage of the ASN.1 protocol development lifecycle: 
specification, design, development, and evaluation. It is composed of an inter- 
preter, compiler, and validator, albeit with varying levels of maturity. HAAW is 
implemented in Haskell. 

The HAAW compiler is built using semi-formal design techniques and is thor- 
oughly tested to help ensure correctness. The implementation of the HAAW com- 
piler is structured to be as manifestly correct as feasible. It effectively imports a 
(separately tested) ASN.1 interpreter which is then “partially-evaluated” on the 
fly to generate code. The passes are as follows: An input ASN.1 specification is 
“massaged” to a specification-like form which can be interpreted by a built-in 
ASN.1 interpreter. This specification-like form is combined with the interpreter 
code and is converted into a lambda-calculus representation; to this representa- 
tion we apply multiple optimization rules; we finally “sequentialize” to a monadic 
lambda-calculus (where we are left with the lambda calculus, sequencing oper- 
ators, and encoding/decoding primitives), this last representation is then trans- 
formed into C code. The generated code is linked with a library that encodes 
and decodes the basic ASN.1 types. 

Moreover, while the HAAW compiler improves the quality of the code gen- 
erated, we verify the generated code and libraries directly, so HAAW is not part 
of the trusted code-base. 


4.2 The Software Analysis Workbench (SAW) 


The Software Analysis Workbench (SAW)! is Galois’ open-source, state-of-the- 
art symbolic analysis engine for multiple programming languages. Here we briefly 
introduce SAW, see Dockins et al. [6] for more details. 

An essential goal of SAW is to generate semantic models of programs inde- 
pendent of a particular analysis task and to interface with existing automated 
reasoning tools. SAW is intended to be mostly automated but supports user- 
guidance to improve scalability. 

The high-level architecture of SAW is shown in Fig. 1. At the heart of SAW 
is SAWCore. SAWCore is SAW’s intermediate representation (IR) of programs. 
SAWCore is a dependently-typed functional language, providing a functional rep- 
resentation of the semantics of a variety of imperative and functional languages. 


1 saw.galois.com. 
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Fig. 1. SAW architecture, reproduced from [6]. 


SAWCore includes common built-in rewrite rules. Additionally, users can pro- 
vide domain-specific rewrite rules, and because SAWCore is a dependently-typed 
language, rewrite rules can be given expressive types to prove their correctness. 

SAW currently supports automated translation of both low-level virtual 
machine (LLVM) and Java virtual machine (JVM) into SAWCore. Thus, pro- 
gramming languages that can be compiled to these two targets are supported 
by SAW. Indeed, SAW can be used to prove the equivalence between programs 
written in C and Java. 

SAWCore can also be generated from Cryptol. Cryptol is an open-source 
language? for the specification and formal verification of bit-precise algorithms 
[10], and we use it to specify portions of our code, as we describe in Sect. 6. 

A particularly interesting feature of Cryptol is that it is a typed functional 
language, similar to Haskell, but includes a size-polymorphic type system that 
includes linear integer constraints. To give a feeling for the language, the con- 
catenate operator (#) in Cryptol has the following type: 


(#) : fst, snd, a (fin fst) 
=> [fst]a -> [snd]a -> [fst + snd]la 


It concatenates two sequences containing elements of type a, the first of length 
fst—which is constrained to be of finite (fin) length (infinite sequences are 
expressible in Cryptol)—and the second of length snd. The return type is a 
sequence of a’s of length fst + snd. Cryptol relies on satisfiability modulo the- 
ories (SMT) solving for type-checking. 

SAWCore is typically exported to various formats supported by external 
third-party solvers. This includes SAT solver representations (and inverter 
graphs (AIG), conjunctive normal form (CNF), and ABC’s format [3]), as well 
as SMT-Lib2 [2], supported by a range of SMT solvers. 


? https://cryptol.net /. 
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SAW allows bit-precise reasoning of programs, and has been used to prove 
optimized cryptographic software is correct [6]. SAW’s bit-level reasoning is also 
useful for encode/decode verification, and in particular, ASN.1’s UPER encoding 
includes substantial bit-level operations. 

Finally, SAW includes SAWScript, a scripting language that drives SAW and 
connects specifications with code. 


5 Properties: Encode/Decode Self Consistency 


Ideally, we would prove full functional correctness for the encode/decode pair: 
that they correctly implement the ASN.1 UPER encoding/decoding rules for 
the ASN.1 types defined in SAE J2735. However, to develop a specification 
that would formalize all the required ASN.1 constructs, their semantics, and the 
proper UPER encoding rules would be an extremely large and tedious undertak- 
ing (decades of “man-years”’?). Moreover, it is not clear how one would ensure 
the correctness of such a specification. 

Instead of proving full functional correctness, we prove a weaker property 
by proving consistency between the encoder and decoder implementations. We 
call our internal consistency property self-consistency, which we define as the 
conjunction of two properties, round-trip and rejection. We show that self- 
consistency implies that decode is the inverse of encode, which is an intuitive 
property we want for an encode/decode pair. 

The round-trip property states that a valid message that is encoded and then 
decoded results in the original message. This is a completeness property insofar 
as the decoder can decode all valid messages. 

A less obvious property is the rejection property. The rejection property infor- 
mally states that any invalid byte stream is rejected by the decoder. This is a 
soundness property insofar as the decoder only decodes valid messages. 

In the context of general ASN.1 encoders/decoders, let us fix a schema S' and 
an encoding rule. Let Mg be the set of all ASN.1 abstract messages that satisfy 
the schema. Let B the set of all finite byte streams. Let enc, : M; — B be an 
encoder, a total function on Ms. Let error be a fixed constant such that error ¢ 
Ms. Let the total function dec, : B — (M, U {error}) be its corresponding 
decoder. 

The round-trip and rejection properties can respectively be stated as follows: 


Definition 1 (Round-trip) 
Ym E M,.dec,(enc,(m)) = m. 
Definition 2 (Rejection) 
Vb € B.decs(b) = error V encs(decs(b)) = b. 


The two properties are independent: a decoder could properly decode valid 
byte streams while mapping invalid byte streams to valid messages. Such a 
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decoder would be allowed by Round-trip but not by Rejection. An encode/decode 
pair that fails the Rejection property could mean that dec does not terminate 
normally on some inputs (note that error is a valid return value of dec). Clearly, 
undefined behavior in the decoder is a security risk. 


Definition 3 (Self-consistency). An encode/decode pair encg and decg is 
self-consistent if and only if it satisfies the round-trip and rejection properties. 


Self-consistency does not require any reference to a specification of ASN.1 
encoding rules, simplifying the verification. Indeed, they are applicable to any 
encode/decode pair of functions. 

However, as noted at the outset, self-consistency does not imply ful func- 
tional correctness. For example, for an encoder encg and decoder decs pair, 
suppose the messages Ms = {mg, mı} and the byte streams B includes 
{bo, bı} C B. Suppose that according to the specification, it should be the 
case that encs(mo) = bo, encg(m1) = bı, decs(bo) = mo and dec(b,) = mı, and 
for all b € B such that b Æ bo and b Æ bı, decg(b) = error. However, suppose 
that in fact encs(mo) = bı, encg(m1) = bo, decg(bo) = mı and decs(b1) = mo, 
and for all other b € B, dec(b) = error. Then encg and decg satisfy both the 
round-trip and rejection properties, while being incorrect. 

That said, if self-consistency holds, then correctness reduces to showing that 
either encoder or decoder matches its specification, but showing both hold is 
unnecessary. 

In our work, we formally verify self-consistency and memory safety. We also 
give further, informal, evidence of correctness by both writing individual test 
vectors and by comparing our test vectors to that produced by other ASN.1 
compilers. 


6 Verification 


Figure2 summarizes the overall approach to generating and verifying the 
encode/decode pair, which we reference throughout this section. 


6.1 First Steps 


The given SAE J2735 ASN.1 specification (J2735.asn) is given as input to HAAW 
to generate C code for the encoder and decoder. A HAAW standard library 
is emitted (the dotted line from HAAW to libHAAW.c in Fig.2 denotes that 
the standard library is not specific to the SAE-J2735 specification and is not 
compiled from HAAW). 

We wrote the round-trip and rejection properties (Sect. 5) as two C functions. 
For example, the round-trip property is encoded, approximately, as follows: 


bool round_trip(BSM *msg_in) { 
unsigned char str[BUF_SIZE] ; 
enc(msg_in, str); 
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Fig. 2. Code generation and verification flow. 


BSM *msg_out ; 
dec(msg_out, str); 
return equal_msg(msg_in, msg_out) ; 


} 


The actual round-trip property is slightly longer as we need to deal with C 
level setup, allocation, etc. This is why we chose to implement this property in 
C (rather than in SAWScript). 

Now all we need to do is verify, in SAWScript, that the C function round_trip 
returns 1 for all inputs. At this point, it would be nice to say the power of our 
automated tools was sufficient to prove round_trip without further programmer 
intervention. This, unsurprisingly, was not the case. Most of the applications of 
SAW have been to cryptographic algorithms where code typically has loops with 
statically known bounds. In our encoder/coder code we have a number of loops 
with unbounded iterations: given such code we need to provide some guidance 
to SAW. 

In the following sections we present how we were able to use SAW, as well 
as our knowledge of our specific code, to change an intractable verification task 
into one that could be done (by automated tools) in less than 5h. An important 
note: the rest of this section describes SAW techniques that allow us to achieve 
tractability, they do not change the soundness of our results. 
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6.2 Compositional Verification with SAW Overrides 


SAW supports compositional verification. A function (e.g., compiled from Java or 
C) could be specified in Cryptol and verified against its specification. That Cryp- 
tol specification can then be used in analyzing the remainder of the program, 
such that in a symbolic simulation, the function is replaced with its specification. 
We call this replacement an override. Overrides can be used recursively and can 
dramatically improve the scalability of a symbolic simulation. SAW’s scripting 
language ensures by construction that an override has itself been verified. 

Overrides are like lemmas, we prove them once, separately, and can re-use 
them (without re-proof). The lemma that an override provides is an equivalence 
between a C function and a declarative specification provided by the user (in 
Cryptol). The effort to write a specification and add an override is often required 
to manage intractability of the automated solvers used. 


6.3 Overriding “copy_bits” in SAW 


There are two critical 1ibHAAW functions that we found to be intractable to verify 
using symbolic simulation naively. Here we describe generating overrides for one 
of them: 


copy_bits 
( unsigned char * dst 
» uint32_t *dst_i 
» unsigned char const * src 
» uint32_t *src_i 
, uint32_t const length) 


{ 
uint32_t src_i_bound = ¥*src_i + length; 
while (*src_i < src_i_bound) { 
copy_overlapping_ bits (dst, dst_i, src, src_i, src_i_bound) ; 
} 
return 0; 
} 


The above function copies length bits from the src array to the dst array, 
starting at the bit indexed by src_i in src and index dst_i in dst; src_i and 
dst_i are incremented by the number of bits copied; copy_overlapping_bits is 
a tedious but loop-free function with bit-level computations to convert to/from 
a bit-field and byte array. This library function is called by both the encoder 
and decoder. 

One difficulty with symbolically executing copy_bits with SAW is that SAW 
unrolls loops. Without a priori knowledge of the size of length and src-_i, there 
is no upper bound on the number of iterations of the loop. Indeed, memory 
safety is dependent on an invariant holding between the indices, the number of 
bits to copy, and the length of the destination array: the length of the destination 
array is not passed to the function, so there is no explicit check to ensure no 
write-beyond-array in the destination array. 
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Even if we could fix the buffer sizes and specify the relationship between 
the length and indexes so that the loop could be unrolled in theory, in practice, 
it would still be computationally infeasible for large buffers. In particular, we 
would have to consider every valid combination of the length and start indexes, 
which is cubic in the bit-length of the buffers. 

To override copy_bits, we write a specification of copy_bits in Cryptol. 
The specification does not abstract the function, other than eliding the details 
of pointers, pointer arithmetic, and destructive updates in C. The specification 
is given below: 


copy_bits : dst_n, src_n 
[dst_n] [8] -> [32] -> [src_n][8] -> [32] -> [32] 
-> ([dst_n] [8], [32], [32]) 
copy_bits dst0 dst_i0 src src_i0 length = (dst1, dst_il, src_i1) 
where 
dst_bits0 join dst0 
src_bits0 = join src 


dsti = split (copy dst_bits0 0) 
copy dst_bits i = 
if i == length 
then dst_bits 
else copy dst_bits’’ (i + 1) 
where 
dst_bits’’ = update dst_bits (dst_i0 + i) 
(src_bitsO @ (src_iO + i)) 


dst_iil = dst_i0O + length 
src_il = src_i0 + length 


We refer to the Cryptol User Manual for implementation details [10], but to 
provide an intuition, we describe the type signature (the first three lines above): 
the type is polymorphic, parameterized by dst_n and src_n. A type [32] isa 
bit-vector of length 32. A type [dst_n] [8] is an array of length dst_n containing 
byte values. The function takes a destination array of bytes, a 32-bit destination 
index, a source array of bytes, a source index, an a length, and returns a triple 
containing a new destination array, and new destination and source indices, 
respectively. Because the specification is pure, the values that are destructively 
updated through pointers in the C implementation are part of the return value 
in the specification. 


6.4 Multiple Overrides for “copy_bits” in SAW 


Even after providing the above override for copy_bits, we are still beyond 
the limits of our underlying solvers to automatically prove the equivalence of 
copy-bits with its Cryptol specification. 

However, we realize that for the SAE J2735 encode/decode, copy_bits is 
called with a relatively small number of specific concrete values for the sizes of 
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the dst and src arrays, the indexes dst_i and src_i, and the length of bits to 
copy length. The only values that we need to leave symbolic are the bit values 
within the dst and src arrays. Therefore, rather than creating a single override 
for an arbitrary call to copy_bits, we generate separate overrides for each unique 
set of “specializable” arguments, i.e., dst_i, src_i, and length. 

Thus we note another feature of SAW: SAW allows us to specify a set of con- 
crete function arguments for an override; for each of these, SAW will specialize 
the override. (I.e., it will prove each specialization of the override.) In our case 
this turns one intractable override into 56 tractable ones. The 56 specializations 
(which corresponds to the number of SEQUENCE fields in the BSM specifi- 
cation) were not determined by trial and error but by running instrumented 
code. 

It is important to note that the consequence of a missing overrride special- 
ization cannot change the soundness of SAW’s result: Overrides in SAW cannot 
change the proof results, they only change the efficiency of proof finding. If we 
had a missing override specialization for copy_bits we would only be back where 
we started: a property that takes “forever” to verify. 

This approach works well for the simple BSM Part I. However, once we 
begin to verify encoders/decoders for more complex ASN.1 specifications (e.g., 
containing CHOICE and OPTIONAL constructs), this method will need to be gen- 
eralized. 


6.5 Results 


A SAW script (script.saw) ties everything together and drives the symbolic 
execution in SAW and lifts LLVM variables and functions into a dependent logic 
to state pre- and post-conditions and provide Cryptol specifications as needed. 
Finally, SAW then generates a SMT problem; Z3 [5] is the default solver we use. 

Just under 3100 lines of C code were verified, not counting blank or comment 
lines. The verification required writing just under 100 lines of Cryptol specifi- 
cation. There are 1200 lines of SAW script auto-generated by the test harness 
in generating the override specifications. Another 400 lines of SAW script is 
hand-written for the remaining overrides and to drive the overall verification. 

Executed on a modern laptop with an Intel Core i7-6700HQ 2.6 GHz proces- 
sor and 32GB of memory, the verification takes 20 min to prove the round-trip 
property and 275 min to prove the rejection property. The round-trip property 
is less expensive to verify because symbolic simulation is sensitive to branching, 
and for the round-trip property, we assert the data is valid to start, which in turn 
ensures that all of the decodings succeed. In rejection, on the other hand, we have 
a branch at each primitive decode, and we need to consider both possibilities 
(success and failure). 
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7 Discussion 


7.1 LLVM and Definedness 


Note that our verification has been with respect to the LLVM semantics not the 
C source of our code. SAW does not model C semantics, but inputs LLVM as the 
program’s semantics (we use CLANG to generate LLVM from the C). By verifying 
LLVM, SAW is made simpler (it need only model LLVM semantics rather than 
C) and we can do inter-program verification more easily. The process of proving 
that a program satisfies a given specification within SAW guarantees definedness 
of the program (and therefore memory safety) as a side effect. That is, the 
translation from LLVM into SAWCore provides a well-defined semantics for the 
program, and this process can only succeed if the program is well-defined. In 
some cases, this well-definedness is assumed during translation and then proved 
in the course of the specification verification. For instance, when analyzing a 
memory load, SAW generates a semantic model of what the program does if 
the load was within the bounds of the object it refers to, and generates a side 
condition that the load was indeed in bounds. 

Verifying LLVM rather than the source program is a double-edged sword. On 
the one hand, the compiler front-end that generates LLVM is removed from the 
trusted computing base. On the other hand, the verification may not be sound 
with respect to the program’s source semantics. In particular, C’s undefined 
behaviors are a superset of LLVM’s undefined behaviors; a compiler can soundly 
remove undefined behaviors but not introduce them. For example, a flaw in the 
GCC compiler allowed the potential for an integer overflow when multiplying 
the size of a storage element by the number of elements. The result could be 
insufficient memory being allocated, leading to a subsequent buffer overflow. 
CLANG, however, introduces an implicit trap on overflow [12]. 

Moreover, the LLVM language reference does not rigorously specify well- 
definedness, and it is possible that our formalization of LLVM diverges from a 
particular compiler’s [13]. 


7.2 Other Assumptions 


We made some memory safety assumptions about how the encode/decode rou- 
tines are invoked. First, we assume that the input and output buffers provided 
to the encoder and decoder, respectively, do not alias. We also assume that each 
buffer is 37 bytes long (sufficient to hold a BSM with Part I only). A meta 
argument shows that buffers of at least 37 bytes are safe: we verify that for all 
37-byte buffers, we never read or write past their ends. So, if the buffers were 
longer, we would never read the bytes above the 37th element. 

For more complex data schemas (and when we extend to BSM Part IT) whose 
messages require a varying octet size, we would need to ensure the buffers are 
sufficiently large for all message sizes. 
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7.3 Proof Robustness 


By “proof robustness” we mean how much effort is required to verify another 
protocol or changes to the protocol. We hypothesize that for other protocols 
that use UPER and a similar set of ASN.1 constructs, the verification effort 
would be small. Most of our manual effort focused on the 1ibHAAW libraries, 
which is independent of the particular ASN.1 protocol verified. That said, very 
large protocol specifications may require additional proof effort to make them 
compositional. 

In future work, we plan to remove the need to generate overrides as a separate 
step (as described in Sect.6.2) by modifying HAAW to generate overrides as it 
generates the C code. 


8 Conclusion 


Hopefully we have motivated the security threat to V2V and the need for elimi- 

nating vulnerabilities in ASN.1 code. We have presented a successful application 

of automated formal methods to real C code for a real-world application domain. 
There are some lessons to be learned from this work: 


(1) Fully automated proofs of correctness properties are possible, but not trivial. 
The encoding of properties into C and SAWScript and getting the proofs 
to go through took one engineer approximately 3 months, this engineer had 
some experience with SAW; we were also able to get support and bug-fixes 
from the SAW developers. (It also helped that the code was bug-free so no 
“verification” time was spent on finding counter-examples and fixing code.) 

(2) The straightforward structure of the C used in the encode/decode routines 
made them more amenable to automated analysis (see Sect. 6). It certainly 
helped that the code verified was compiler-generated and was by design 
intended to be, to some degree, manifestly correct. The lesson is not “choose 
low-hanging fruit” but “look, low-hanging fruit in safety critical code” or 
possibly even “create low-hanging fruit!” (by using simpler C). 

(3) For non-trivial software, the likelihood of having a correct specification at 
hand, or having the resources to create it, is quite slim! For instance, to 
fully specify correct UPER encoding/decoding for arbitrary ASN.1 specifi- 
cations would be a Herculean task. But in our case, we formulated two sim- 
ple properties—Round-Trip and Rejection—and by proving them we have 
also shown memory safety and some strong (not complete, see Sect. 5) guar- 
antees of functional correctness. This technique could be applied to any 
encode/decode pair. 


There are many ways we hope to extend this work: 


(1) We plan to extend our verification to the full BSM. This now gets us to more 
challenging ASN.1 constructs (e.g., CHOICE) that involve a more complicated 
control-flow in the encoders/decoders. We do not expect a proof to be found 
automatically, but our plan is to generate lemmas with the generated C code 
that will allow proofs to go through automatically. 
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Once we can automatically verify the full BSM, we expect to be able to 
perform a similar fully-automatic verification on many ASN.1 specifications 
(most do not use the full power of ASN.1). We would like to explore what 
properties of a given ASN.1 specification might guarantee the ability to 
perform such a fully-automatic verification. 

By necessity, parts of our SAWScript and the verification properties have 
a dependence on the particular API of the HAAW compiler (how abstract 
values are encoded, details of the encoding/decoding functions, memory- 
management design choices, etc.). Currently the authors are working on gen- 
eralizing this so that one can abstract over ASN.1-tool-specific API issues. 
The goal is to be able to extend our results to other encode/decode pairs 
(generated by hand or by other ASN.1 compilers). 

Note that the self-consistency property is generic (and has no reference to 
ASN.1). As a result, we believe our work can be extended to encode/decode 
pairs on non-ASN.1 data. 
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Abstract. We describe formal verification of s2n, the open source TLS 
implementation used in numerous Amazon services. A key aspect of this 
proof infrastructure is continuous checking, to ensure that properties 
remain proven during the lifetime of the software. At each change to the 
code, proofs are automatically re-established with little to no interac- 
tion from the developers. We describe the proof itself and the technical 
decisions that enabled integration into development. 


1 Introduction 


The Transport Layer Security (TLS) protocol is responsible for much of the 
privacy and authentication we enjoy on the Internet today. It secures our phone 
calls, our web browsing, and connections between resources in the cloud made 
on our behalf. In this paper we describe an effort to prove the correctness of 
s2n [3], the open source TLS implementation used by many Amazon and Amazon 
Web Services (AWS) products (e.g. Amazon $3 [2]). Formal verification plays 
an important role for s2n. First, many security-focused customers (e.g. financial 
services, government, pharmaceutical) are moving workloads from their own data 
centers to AWS. Formal verification provides customers from these industries 
with concrete information about how security is established in Amazon Web 
Services. Secondly, automatic and continuous formal verification facilitates rapid 
and cost-efficient development by a distributed team of developers. 

In order to realize the second goal, verification must continue to work with 
low effort as developers change the code. While fundamental advances have been 
made in recent years in the tractability of full verification, these techniques 
generally either: (1) target a fixed version of the software, requiring significant re- 
proof effort whenever the software changes or, (2) are designed around synthesis 
of correct code from specifications. Neither of these approaches would work for 
Amazon as s2n is under continuous development, and new versions of the code 
would not automatically inherit correctness from proofs of previous versions. 
© The Author(s) 2018 
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To address the challenge of program proving in such a development environ- 
ment, we built a proof and associated infrastructure for s2n’s implementations 
of DRBG, HMAC, and the TLS handshake. The proof targets an existing imple- 
mentation and is updated either automatically or with low effort as the code 
changes. Furthermore, the proof connects with existing proofs of security prop- 
erties, providing a high level of assurance. 

Our proof is now deployed in the continuous integration environment for 
s2n, and provides a distributed team of developers with repeated proofs of the 
correctness of s2n even as they continue to modify the code. In this paper, we 
describe how we structured the proof and its supporting infrastructure, so that 
the lessons we learned will be useful to others who address similar challenges. 

Figure 1 gives an overview of our proof for s2n’s implementation of the HMAC 
algorithm and the tooling involved. At the left is the ultimate security property 
of interest, which for HMAC is that if the key is not known, then HMAC is indis- 
tinguishable from a random function (given some assumptions on the underlying 
hash functions). This is a fixed security property for HMAC and almost never 
changes (a change would correspond to some new way of thinking about security 
in the cryptographic research community). The HMAC specification is also fairly 
static, having been updated only once since its publication in 2002. Beringer 
et al. [6] have published a mechanized formal proof that the high-level HMAC 
specification establishes the cryptographic security property of interest. 

As we move to the right through Fig. 1, we find increasingly low-level arti- 
facts and the rate of change of these artifacts increases. The low-level HMAC 
specification includes details of the API exposed by the implementation, and 
the implementation itself includes details such as memory management and per- 
formance optimizations. This paper focuses on verifying these components in a 
manner that uses proof automation to decrease the manual effort required for 
ongoing maintenance of these verification artifacts. At the same time, we ensure 
that the automated proof occurring on the right-hand side of the figure is linked 
to the stable, foundational security results present at the left. 

In this way, we realize the assurance benefit of the foundational security 
work of Beringer et al. while producing a proof that can be integrated into the 
development workflow. The proof is applied as part of the continuous integration 
system for s2n (which uses Travis CI) and runs every time a code change is 
pushed or a pull request is issued. In one year of code changes only three manual 
updates to the proof were required. 

The s2n source code, proof scripts, and access to the underlying proof tools 
can all be found in the s2n GitHub [3] repository. The collection of proof runs 
is logged and appears on the s2n Travis CI page [4]. 

In addition to the HMAC proof, we also reused the approach shown in 
the right-hand side of Fig.1 to verify the deterministic random big generator 
(DRBG) algorithm and the TLS Handshake protocol. In these cases we didn’t 
link to foundational cryptographic security proofs, but nonetheless had specifi- 
cations that provided important benefits to developers by allowing them to (1) 


1 And this update did not change the functional behavior specified in the standard. 
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check their code against an independent specification and (2) check that their 
code continues to adhere to this specification as it changes. Our TLS Handshake 
proof revealed a bug (which was promptly fixed) in the s2n implementation [10], 
providing evidence for the first point. All of our proofs have continued to be used 
in development since their introduction, supporting the second point. 


Increasing Automation 


Changes Infrequenthy "=== Changes Frequently 


Work of Beringer et al. This Paper 
z A Implementation 
= High-Level High-Level Low-Level 
Secutlty Eropelty) | ee | ee | etn 
s2n C code 
Indistinguishability Coq HMAC aa Incremental 7 
from random | Specification = Se 7.. version 2 a 
.. Version 3... 

Proved In Coq Proved In Coq Combination of Cog (Manual) Proved with SAW 

and Cryptol (Automatic) (mostly automatic) 
Fig. 1. An overview of the structure of our HMAC proof. 


Related Work. Projects such as Everest [8,12], Cao [5], and Jasmin [1], gener- 
ate verified cryptographic implementations from higher level specifications, e.g. 
F* models. While progress in this space continues to be promising ——HACL* has 
recently achieved performance on primitives that surpasses handwritten C [25]— 
we have found in our experiments that the generated TLS code does not yet meet 
the performance, power, and space constraints required by the broad range of 
AWS products that use s2n. 

Static analysis for hand-written cryptographic implementations has been pre- 
viously reported in the context of Frama-C/PolarSSL [23], focusing on scaling 
memory safety verification to a large body of code. Additionally, unsound but 
effective bug hunting techniques such as fuzzing have been applied to TLS imple- 
mentations in the past [11,18]. The work we report on goes further by proving 
behavioral correctness properties of the implementation that are beyond the 
capabilities of these techniques. In this we were helped because the implemen- 
tation of s2n is small (less than 10k LOC), and most iteration is bounded. 

The goal of our work is to verify deep properties of an existing and actively 
developed open source TLS implementation that has been developed for both 
high performance and low power as required by a diverse range of AWS prod- 
ucts. Our approach was guided by lessons learned in several previous attempts 
to prove the correctness of s2n that either (1) required too much developer 
interaction during the modification of the code [17], or (2) where pushbutton 
symbolic model checking tools did not scale. Similarly, proofs developed using 
tools from the Verified Software Toolchain (VST) [6] are valuable for establishing 
the correctness and security of specifications, but are not sufficiently resilient to 
code changes, making them challenging to integrate into an ongoing develop- 
ment process. Their use of a layered proof structure, however, provided us with 
a specification that we could use to leverage their security proof in our work. 
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O’Hearn details the industry impact of continuous reasoning about code 
in [19], and describes additional instances of integration of formal methods with 
developer workflows. 


2 Proof of HMAC 


In this section, we walk through our HMAC proof in detail, highlighting how 
the proof is decomposed, the guarantees provided, the tools used, and how this 
approach supports integration of verification into the development work-flow. 
While HMAC serves as an example, we have also performed a similar proof of 
the DRBG and TLS Handshake implementations. We do not discuss DRBG 
further, as there are no proof details that differ significantly from HMAC. We 
describe our TLS verification in Sect. 3. 


2.1 High-Level HMAC Specification 


The keyed-Hash Message Authentication Code algorithm (HMAC) is used for 
authenticated integrity in TLS 1.2. Authenticated integrity guarantees that the 
data originated from the sender and was not changed or duplicated in transit. 
HMAC is used as the foundation of the TLS Pseudorandom Function (PRF), 
from which the data transmission and data authentication shared keys are 
derived. This ensures that both the sender and recipient have exchanged the 
correct secrets before a TLS connection can proceed to the data transmission 
phase. 

HMAC is also used by some TLS cipher suites to authenticate the integrity 
of TLS records in the data transmission phase. This ensures, for example, that 
a third party watching the TLS connection between a user and a webmail client 
is unable to change or repeat the contents of an email body during transmission. 
It is also used by the HMAC-based Extract-and-Expand Key Derivation Func- 
tion (HKDF) which is implemented within s2n as a utility function for general 
purpose key derivation and is central to the design of the TLS1.3 PRF. 

FIPS 198-1 [24] defines the HMAC algorithm as 


HMAC(K, message) = H((K © opad)||H((K © ipad)||message)) 


where H is any hash function, © is bitwise xor, and || is concatenation. opad and 
ipad are constants defined by the specification. We will refer to this definition 
as the monolithic specification. 

Following Fig. 1, we use the Cryptol specification language [14] to express 
HMAC in a form suitable for mechanized verification, first in a monolithic form, 
and then in an incremental form. We prove high-level properties with Coq [22] 
and tie these to the code using the Software Analysis Workbench (SAW) [16]. 
We first describe the proof of high-level properties before going into specifics 
regarding the tools in Sect. 2.4. 
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2.2 Security Properties of HMAC 
The Cryptol version of the Monolithic HMAC specification follows. 
hmac k message = H((k ^ opad) # H((k ^ ipad) # message)) 


where H is any hash function, ~ is bitwise xor, and # is concatenation. 

The high-level Cryptol specification and the FIPS document look nearly iden- 
tical, but what assurance do we have that either description of the algorithm 
is cryptographically secure? We can provide this assurance by showing that the 
Cryptol specification establishes one of the security properties that HMAC is 
intended to provide—namely, that HMAC is indistinguishable from a function 
returning random bits. 

Indistinguishability from random is a property of cryptographic output that 
says that there is no effective strategy by which an attacker that is viewing the 
output of the cryptographic function and a true random output can distinguish 
the two, where an “effective” strategy is one that has a non-negligible chance 
of success given bounded computing resources. If the output of a cryptographic 
function is indistinguishable from random, that implies that no information can 
be learned about the inputs of that function by examining the outputs. 

We prove that our Cryptol HMAC specification has this indistinguishability 
property using an operational semantics of Cryptol we developed in Coq. The 
semantics enable us to reuse portions of the proof by Beringer et. al [6], which 
uses the Coq Foundational Cryptography Framework (FCF) library [20] to estab- 
lish the security of the HMAC construction. We construct a Coq proof showing 
that our Cryptol specification is equivalent (when interpreted using the formal 
operational semantics) to the specification considered in the Beringer et. al work. 
The Cryptol specification is a stepping stone to automated verification of the 
s2n implementations, so when combined with the verification work we describe 
subsequently, we eventually establish that the implementation of HMAC in s2n 
also has the desired security property. The Coq code directly relating to HMAC 
is all on the s2n GitHub page. These proofs are not run as part of continuous 
integration, rather, they are only rerun in the unlikely event that the monolithic 
specification changes. 


2.3 Low-Level Specification 


The formal specification of HMAC presented in the FIPS standard operates on 
a single complete message. However, network communication often requires the 
incremental processing of messages. Thus all modern implementations of HMAC 
provide an incremental interface with the following abstract types: 


init : Key => State 
update : Message -> State -> State 
digest : State -> MAC 


The init function creates a state from a key, the update function updates 
that state incrementally with chunks of the message, and the digest function 
finalizes the state, producing the MAC. 
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The one-line monolithic specification is related to these incremental functions 


as follows. If we can partition a message m into m = mi||məl]| ... ||Mn then (in 
pseudo code/logic notation) 
HMAC(k, m) = digest(update(m,,(... (update mı (init k)))) (1) 


In other words, any MAC generated by partitioning a message and incrementally 
sending it in order through these functions should be equal to a MAC generated 
by the complete message HMAC interface used in the specification. 

We prove that the incremental interface to HMAC is equivalent to the non- 
incremental version using a combination of manual proof in Coq and auto- 
mated proof in Cryptol. Note that this equivalence property can be stated in 
an implementation-independent manner and proved outside of a program veri- 
fication context. This is the approach we take—independently proving that the 
incremental and monolithic message interfaces compute the same HMAC, and 
then separately showing that s2n correctly implements the incremental interface. 

Our Coq proof proceeds via induction over the number of partitions with 
the following lemmas establishing the relationship between the monolithic and 
iterative implementations. These lemmas are introduced as axioms in the Coq 
proof, but subsequently checked using SAW. 


update_empty : forall s, HMAC_update empty_string s = s. 


equiv_one : forall m k, 
HMAC_digest (HMAC_update m (HMAC_init k)) = HMAC k m. 


update_concat : forall m1 m2 s, 
HMAC_update (concat m1 m2) s = HMAC_update m2 (HMAC_update m1 s). 


The first lemma states that processing an empty message does not change 
the state. The second lemma states that applying the incremental interface to a 
single message is equivalent to applying the monolithic interface. These lemmas 
constitute the base cases for an inductive proof of equation (1) above. The last 
lemma states that calling update twice (first with m1 and then with m2) results 
in the same state as calling update once with m1 concatenated with m2. This 
constitutes the inductive step in the proof of (1). 

The update_empty lemma can be proved by analyzing the code with symbolic 
values provided for the state s, as the state is of fixed size. The equiv_one and 


update_concat Jemmas require reasoning about unbounded data. SAW has lim- 
ited support for such proofs. In particular, it has support for equational rewriting 
of terms in its intermediate language, but not for induction. In the case of the 
update_concat lemma, a few simple builtin rewrite rules are sufficient to estab- 
lish the statement for all message sizes. For equiv_one, a proof of the statement 
for all message sizes would require induction. Since SAW does not support induc- 
tion, we prove that this statement holds for a finite number of key and message 
sizes. In theory we could still obtain a complete proof by checking all message 
sizes up to 16k bytes (the maximum size message permitted by the TLS stan- 
dard). This may be tractable in a one-off proof, but for our continuously-applied 
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proofs we instead consider a smaller set of samples, chosen to cover all branches 
in the code. This yields a result that is short of full proof, but still provides much 
higher state space coverage than testing methods. 

Given the three lemmas above, we then use Coq to prove the following the- 
orem by induction on the list of partitions, ms. 


HMAC key (fold_right concat empty_string ms) = 
HMAC_digest (fold_left (fun (st: state) msg => 
HMAC_update msg st) 
ms 
(HMAC_init key)). 


The theorem establishes the equivalence of the incremental and monolithic 
interfaces for any decomposition of a message into any number of fragments of 
any size. 


2.4 Implementation Verification 


The incremental Cryptol specification is low-level enough that we were able to 
connect it to the s2n HMAC implementation using automated proof techniques. 
As this is the aspect of the verification effort that is critical for integration into 
an active development environment, we go into some detail, first discussing the 
tools that were used and then describing the structure of the proof. 


Tools. We use the Software Analysis Workbench (SAW) to orchestrate this step 
of the proof. SAW is effective both for manipulating the kinds of functional terms 
that arise from Cryptol, and for constructing functional models from imperative 
programs. It can be used to show equivalence of distinct software implemen- 
tations (e.g. an implementation in C and one in Java) or equivalence of an 
implementation and an executable specification. 

SAW uses bounded symbolic execution to translate Cryptol, Java, and C pro- 
grams into logical expressions, and proves properties about the logical expres- 
sions using a combination of rewriting, SAT, and SMT. The result of the bounded 
symbolic execution of the input programs is a pure functional term representing 
the function’s entire semantics. These extracted semantics are then related to 
the Cryptol specifications by way of precondition and postcondition assertions 
on the program state. 

The top-level theorems we prove have some variables that are universally 
quantified (e.g. the key used in HMAC) and others that are parameters we 
instantiate to a constant (e.g. the size of the key). We achieve coverage for 
the latter by running the proof for several parameter instantiations. In some 
cases this is sufficient to cover all cases (e.g. the standard allows only a small 
finite number of key sizes). In others, the space of possible instantiations is 
large enough that fully covering it would yield runtimes too long to fit into the 
developer workflow (for example, messages can be up to 16k long). In such cases, 
we consider a smaller set of samples, chosen to cover all branches in the code. 
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This yields a result that is short of full proof, but still provides much higher 
state space coverage than testing methods. 

Internally SAW reasons about C programs by first translating them to LLVM. 
For the remainder of the paper we will talk about the C code, although from 
a soundness perspective the C code must be compiled through LLVM for the 
proofs to apply to the compiled code. 


Proof Structure. The functions in the low-level Cryptol specification described 
above share the incremental format of the C program, and also consume argu- 
ments and operate on state that matches the usage of arguments and state in the 
C code. However, the Cryptol specification does not capture the layout of state 
in memory. This separates concerns and allows us to reason about equivalence of 
the monolithic and incremental interfaces in a more tractable purely functional 
setting, while performing the implementation proof in a context in which the 
specification and implementation are already structurally quite similar. 

As an example of this structural similarity, the C function has type: 


int s2n_hmac_update (struct s2n_hmac_state *state, 
const void *in, uint32_t size); 


We define a corresponding Cryptol specification with type: 


hmac_update : {Size} (32 >= width Size) => 
HMAC_state -> [Size][8] -> HMAC_state 


These type signatures look a bit different, but they represent the same thing. 
In Cryptol, we list Size first, because it is a type, not a value. This means 
that we do not need to independently check that the input buffer (in Cryptol 
represented by the type [Size] [8]) matches the size input—the Cryptol type 
system guarantees it. The type system also sets the constraint that the size 
doesn’t exceed 232, a constraint set by the C type of Size. 

We use SAW’s SAWScript language to describe the expected memory layout 
of the C program, and to map the inputs and outputs of the Cryptol function 
to the inputs and outputs of the C program. The following code presents the 
SAW Script for the hmac_update_spec function. 


1 let hmac_update_spec msg_size cfg = do { 


2 (msg_val, msg_pointer) <- ptr_to_fresh_array msg_size i8; 
3 (initial_state, state_pointer) <- setup_hmac_state cfg 

4 hmac_invariants initial_state cfg; 

5 

6 execute_func [state_pointer, message_pointer, msg_size] ; 
7 

8 let final_state = 

9 {{ hmac_update_c_state initial_state msg_val }}; 

10 check_hmac_state state_pointer final_state; 

11 hmac_invariants final_state cfg; 

12 check_return zero; 
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This SAWScript code represents a Hoare triple, with the precondition and 
post condition separated by the body (the execute_func command), which per- 
forms the symbolic execution of the LLVM code using the provided arguments. 
Lines 2 and 3 are effectively universal quantification over the triple, setting up 
the values and pointers that match the type needed by the C function. The 
values msg_val and initial_state are referenced in both the C code and the 
Cryptol specification, whereas the pointers exist only on the C side. 

Lines 8-10 capture that the final state resulting from executing the C function 
should be equivalent to the state produced by evaluating the Cryptol specifica- 
tion. Specifically, Lines 8 and 9 capture the output of the Cryptol specification 
(double curly braces denote Cryptol expressions within SAWScript) and Line 10 
asserts that this state matches the C state present in memory at state_pointer. 
This is what ultimately establishes equivalence of the implementation and spec- 
ification. 

The proof is aided by maintaining a collection of state invariants, which are 
assumed to hold in Line 4 and are re-established in Line 11. These are manual 
invariants, but they occur as function specifications rather than appearing inter- 
nal to loops. They only require modification in the event that the meaning of 
the HMAC state changes. 

The msg_size parameter indicates how large of a message this particular 
proof should cover. Because SAW performs a bounded unrolling of the program 
under analysis, each proof must cover one fixed size for each unbounded data 
structure or iterative construct. However, by parameterizing the proof, it can 
easily be repeated for multiple sizes. Furthermore, as described in Sect. 2.3, we 
also prove in Coq that calling update twice with messages mı and mg is equiv- 
alent to calling it once with mı concatenated with m2. As a consequence, the 
fixed size proofs we perform of update can be composed to guarantee that the 
update function is correct even over longer messages. 

The cfg parameter contains configuration values for each of the six hashes 
that can be used with HMAC. The configuration values of interest to HMAC 
are the input and output sizes of the hash block function. 

Given the specification of the C function above, we can now verify that the 
implementation satisfies the specification: 


verify m "s2n_hmac_update" 
hash_ovs true (hmac_update_spec msg_size cfg) yices_hash_unint; 


The "s2n_hmac_update" argument specifies the C function that we are veri- 
fying. hash_ovs is a list, defined elsewhere, that contains all of the overrides that 
the verification will use. An override is a specification that will be used in place 
of a particular implementation function and corresponds to what other tools call 
stubs or models. In this case, we’ve overridden all of the C hash functions, stat- 
ing assumptions regarding their use of memory and their equivalence to Cryptol 
implementations of the same hash functions. When the verifier comes across a 
call to one of these hash functions in the C code, it will instead use the provided 
specification. The result is that our proof assumes correct implementation of the 
hash functions. 
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The fact that the structure of the low-level Cryptol specification matches 
the structure of the C code, coupled with SAW’s use of SMT as the primary 
mechanism for discharging verification conditions, enables a proof that contin- 
ues to work through a variety of code changes. In particular, changes to the 
code in function bodies often requires no corresponding specification or proof 
script change. Similarly, changes that add fields or change aspects of in-memory 
data structures that are not referenced by the specification do not require proof 
updates. Changes in the API (e.g. function arguments) do require proof script 
changes, but these are typically minor. Fixing a broken proof typically involves 
adding a new state field to the SAW script, updating the Cryptol specification to 
use that field correctly, and then passing the value of that field into the Cryptol 
program in the post-condition. If the Cryptol specification is incorrect, SAW will 
generate counterexamples that can be used to trace through the code and the 
spec together in order to discover the mismatch. 


2.5 Integrating the Proof into Development 


Integration with the s2n CI system mostly took place within the Travis config- 
uration file for s2n. At the time of integration, targets for the build, integration 
testing, and fuzzing on both Linux and OSX already existed. We updated the 
Travis system with Bash scripts that automatically download and install the 
appropriate builds of SAW, Z3, and Yices into the Travis system. These files are 
in the s2n repository and can be reused by anyone under the Apache 2.0 license. 

A Travis CI build can occur on any number of virtual machines, and each 
virtual machine is given an hour to complete. We run our HMAC proofs on 
configurations for six different hashes. For each of these configurations we check 
at three key-sizes in order to test the relevant cases in the implementation (small 
keys get padded, exact keys remain unchanged, and large keys are hashed). For 
each of those key-sizes we check six different message sizes. These proofs run in 
an average of ten minutes. We discovered that it’s best to stay well clear of the 
60 min limit imposed by Travis in order to avoid false-negatives due to variations 
in execution time. 

The proof runs alongside the tests that are present in the s2n repository on 
every build, and if the proof fails a flag is raised just as if a test case were to fail. 


3 Proof of TLS Handshake 


In addition to the HMAC and DRBG proofs, we have proved correctness of the 
TLS state machine implemented in s2n. Specifically, we have proved that (1) it 
implements a subset of TLS 1.2 as defined in IETF RFCs 5246 [21], 5077 [15] and 
6066 [13] and (2) the socket corking API, which optimizes how data is split into 
packets, is used correctly. Formally, we proved that the implementation refines 
a specification (conversely, the specification simulates the implementation). We 
obtained this Cryptol specification, called the RFC specification by examining 
the RFCs and hand-compiling them into a Cryptol file complete with relevant 
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excerpts from the RFCs. We assume that the TLS handshake as specified in the 
RFCs is secure, and do not formalize nor verify any cryptographic properties 
of the specification. In the future, we would like to take a similar approach to 
that described in Sect. 2.2 to link our refinement proof with a specification-level 
security proof for TLS, such as that from miTLS [9]. 

The s2n state machine is designed to ensure correctness and security, pre- 
venting join-of-state-machines vulnerabilities like SMACK [7]. In addition, s2n 
allows increased throughput via the use of TCP socket corking, which combines 
several TLS records into one TCP frame where appropriate. 

The states and transitions of the s2n state machine are encoded explicitly 
as linearized arrays, as opposed to being intertwined with message parsing and 
other logic. This is an elegant decomposition of the problem that makes most of 
the assumptions explicit and enables the use of common logic for message and 
error handling as well as protocol tracking. 

Even with the carefully designed state machine implementation, formal spec- 
ification and verification helped uncover a bug [10]. 


Structure of the TLS Handshake State Machine Correctness Proof. 
The automated proof of correctness of the TLS state machine has two parts 
(Fig. 2). First we establish an equivalence between the two functions? that drive 
the TLS handshake state machine in s2n and their respective specifications in 
Cryptol. Again we utilize low-level specifications that closely mirror the shape 
of the C functions. Our end goal, however, is correctness with respect to the 
standards, encoded in the RFC specification in Cryptol. The library implements 
only a subset of the standards, thus we can only prove a simulation relation and 
not equivalence. Namely, we show that every sequence of messages generated by 
the low-level specification starting from a valid initial state can be generated by 
the RFC specification starting from a related state. The dashed line in Fig. 2 
shows at which points the states match at the implementation and specification 
levels. 


CLIENT_HELLO SERVER_HELLO APPLICATION_DATA 

RFC spec , handshakeTransition N 

| | 

| ii 

fi advance message 

Low-lvl spec © $ x» > OO 

conn_set_handshake_type Yy í Y 

(i s2n_ advance message 
CCode Q O> O O 


$2n_conn_set_handshake_typé 


Fig. 2. Structure of the TLS handshake correctness proof 


2 s2n_conn_set_handshake_type and s2n_advance_message. 
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RFC-Based Specification of the TLS Handshake. The high-level hand- 
shake protocol specification that captures the TLS state machine is implemented 
in Cryptol and accounts for the protocol, message type and direction, as well 
as conditions for branching in terms of abstract connection parameters, but not 
message contents. 

We represent the set of states as unsigned 5-bit integers (Listing 1). The state 
transition relation is represented by a Cryptol function handshakeTransition 
(Listing 2) which, given abstract connection parameters (Listing 3) and the 
current state returns the next state. If there is no valid next state, the state 
machine stutters. The parameters determine the transition to take in each state 
and represent configurations of the end-points as well as contents of the HELLO 
message sent by the other party. We kept the latter separate from the message 
specifications in order to avoid reasoning about message structure and parsing. 
We can still relate the abstract parameters to the implementation because they 
are captured in the connection state. Finally, the message function (Listing 4) 
gives the message type, protocol and direction for every state. 


type State = [5] 

(helloRequestSent : State) = 0 
(clientHelloSent : State) = 1 
(serverHelloSent : State) = 2 

VS aes 

(serverCertificateStatusSent : State) = 23 


Listing 1: Specification of TLS handshake protocol states 


handshakeTransition : Parameters -> State -> State 
handshakeTransition params old = 
snd (find fst (True, old) [ (old == from /\ p, to) 
| (from, p, to) <- valid_transitions]) where 
valid_transitions = 
[(helloRequestSent, True, clientHelloSent) 
,(clientHelloSent, True, serverHelloSent) 
, (serverHelloSent, params.keyExchange != DH_anon 
/\ “params.sessionTicket, serverCertificateSent) 
TD ies 
, (serverCertificateStatusSent, ~(keyExchangeNonEphemeral params) 
, serverKeyExchangeSent) 


] 


Listing 2: Specification of the TLS handshake state transition function. Valid 
transitions are encoded as triples (start, transition condition, end). 
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type KeyExchange = [3] 


(DH_anon : KeyExchange) = 0 
VA ae 
(DH_RSA : KeyExchange) = 5 


type Parameters = 
{keyExchange : KeyExchange // Negotiated key exchange algorithm 
,;sessionTicket : Bit // The client had a session ticket 
,»renewSessionTicket : Bit // Server decides to renew a session ticket 
,sendCertificateStatus : Bit // Server decides to send the certificate 
// status message 
,requestClientCert : Bit // Server requests a cert from the client 
;includeSessionTicket : Bit} // Server includes a session ticket 
// extension in SERVER_HELLO 


Listing 3: Abstract connection parameters 


message : State -> Message 
message = lookupDefault messages (mkMessage noSender data error) 
where messages = 
[(helloRequestSent, mkMessage server handshake helloRequest) 
,(clientHelloSent, mkMessage client handshake clientHello) 
, (serverHelloSent, mkMessage server handshake serverHello) 
VE one 
, (serverChangeCipherSpecSent , 
mkMessage server changeCipherSpec changeCipherSpecMessage) 
, (serverFinishedSent, mkMessage server handshake finished) 
, (applicationDataTransmission, mkMessage both data applicationData) 


] 


Listing 4: Expected message sent /received in each handshake state 


Socket Corking. Socket corking is a mechanism for reducing packet fragmenta- 
tion and increasing throughput by making sure full TCP frames are sent when- 
ever possible. It is implemented in Linux and FreeBSD using the TCP_CORK and 
TCP_NOPUSH flags respectively. When the flag is set, the socket is considered 
corked, and the operating system will only send complete (filled up to the buffer 
length) TCP frames. When the flag is unset, the current buffer, as well as all 
future writes, are sent immediately. 

Writing to an uncorked socket is possible, but undesirable as it might result 
in partial packets being sent, potentially reducing throughput. On the other 
hand, forgetting to uncork a socket after the last write can have more serious 
consequences. According to the documentation, Linux limits the duration of 
corking to 200 ms, while FreeBSD has no limit. Hence, leaving a socket corked in 
FreeBSD might result in the data not being sent. We have verified that sockets 
are not corked or uncorked twice in a row. In addition, the structure of the 
message handling implementation in s2n helps us informally establish a stronger 
corking safety property. Because explicit handshake message sequences include 


Continuous Formal Verification of Amazon s2n 443 


the direction the message is sent, we can establish that the socket is (un)corked 
appropriately when the message direction changes. In future work we plan to 
expand the scope of our proof to allow us to formally establish full corking 
safety. 


4 Operationalizing the Proof 


We have integrated the checking of our proof into the build system of s2n, as 
well as the Continuous Integration (CI) system used to check the validity of code 
as it is added to the s2n repository on GitHub. For the green “build passed” 
badge displayed on the s2n GitHub page to appear, all code updates now must 
successfully verify with our proof scripts. Not only do the these checks run on 
committed code, they are also automatically run on all pull requests to the 
project. This allows the maintainers of s2n to quickly determine the correctness 
of submitted changes when they touch the code that we have proved. In this 
section we discuss aspects of our tooling that were important enablers of this 
integration. 


Proof Robustness. For this integration to work, our proofs must be robust in the 
face of code change. Evolving projects like s2n should not be slowed down by 
the need to update proofs every time the code changes. Too many proof updates 
can lead to significantly slowed development or, in the extreme case, to proofs 
being disabled or ignored in the CI environment. The automated nature of our 
proofs mean that they generally need to be changed only in the event of interface 
modifications—either to function declarations or state definitions. 

Of these two, state changes are the most common, and can be quite complex 
considering that there are usually large possibly nested C structs involved (for 
example, the s2n_connection struct has around 50 fields, some of which are 
structs themselves). To avoid the developer pain that would arise if such struct 
updates caused the proof the break, we have structured the verification so that 
proof scripts do not require updates when the modified portions of the state do 
not affect the computation being proved. Recall that our proofs are focused on 
functional correctness. Thus in order to affect the proof, a new or modified field 
must influence the computation. Many struct changes target non-security-critical 
portions of the code (e.g. to track additional data for logging) and so do not meet 
this criterion. For such fields we prove that they are handled in a memory safe 
manner and that they do not affect the computation being performed by the 
code the proof script targets. 

In the future, we intend to add the option to perform a “strict” version of 
this state handling logic to SAW, which would ensure that newly added fields are 
not modified at all by the portion of the code being proved. Such a check would 
ensure that the computation being analyzed computes the specified function and 
nothing else and would highlight cases in which new fields introduce undesirable 
data flows (e.g. incorrectly storing sensitive data). However even such an option 
would not replace whole program data flow analysis, which we recommend in 
cases where there is concern about potential incorrect data handling. 
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Negative Test Cases. Each of our proofs also includes a series of negative test 
cases as evidence that the tools are functioning properly. These test cases patch 
the code with a variety of mistakes that might actually occur and then run the 
same proof scripts using the same build tools to check that the tool detects the 
introduced error. 

Examples of the negative test cases we use include an incorrect modification 
to a side-channel mitigation, running our TLS proofs on a version of the code 
with an extra call to cork and uncork, a version modified to allow early CCS, as 
well as a version with the incomplete handshake bug that we discovered in the 
process of developing the proof. Such tests are critical, both to display the value 
of the proofs, by providing them with realistic bugs to catch, and as a defense 
against possible bugs in the tool that may be introduced as it is updated. 


Proof Metrics. We also report real-time proof metrics. Our proof scripts print 
out JSON encoded statistics into the Travis logs. From there, we have developed 
an in-browser tool that scrapes the Travis logs for the project, compiling the 
relevant statistics into easily consumable charts and tables. The primary metrics 
we track are: (1) the number of lines of code that are analyzed by the proof (which 
increases as we develop proofs for more components of s2n), and (2) the number 
of times the verified code has been changed and re-analyzed (which tracks the 
ongoing value of the proof). This allows developers to easily track the impact of 
the proofs over time. 

Since deployment of the proof to the CI system in November of 2016 our 
proofs have been re-played 956 times. This number does not account for proof re- 
plays performed in forks of the repository. We have had to update the proof three 
times. In all cases the proof update was complete before the code review process 
finished. Not all of these runs involved modification to the code that our proofs 
were about, however each of the runs increased the confidence of the maintainers 
in the relevant code changes, and each run reestablishes the correctness of the 
code to the public, who may not be aware of what code changed at each commit. 

HMAC and DRBG each took roughly 3 months of engineering effort. The 
TLS handshake verification took longer at 8 months, though some of that time 
involved developing tool extensions to support reasoning about protocols. At 
the start of each project, the proof-writers were familiar with the proof tools but 
not with the algorithms or the s2n implementations of them. The effort amounts 
listed above include understanding the C code, writing the specifications in Cryp- 
tol, developing the code-spec proofs using SAW, the CI implementation work, 
and the process of merging the proof artifacts into the upstream code-base. 


5 Conclusion 


In this case study we have described the development and operation in practice of 
a continuously checked proof ensuring key properties of the TLS implementation 
used by many Amazon and AWS services. Based on several previous attempts 
to prove the correctness of s2n that either required too much developer inter- 
action during modifications or where symbolic reasoning tools did not scale, we 
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developed a proof structure that nearly eliminates the need for developers to 
understand or modify the proof following modifications to the code. 
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Abstract. Liveness violation bugs are notoriously hard to detect, espe- 
cially due to the difficulty inherent in applying formal methods to real- 
world programs. We present a generic and practically useful liveness 
property which defines a program as being live as long as it will eventu- 
ally either consume more input or terminate. We show that this property 
naturally maps to many different kinds of real-world programs. 

To demonstrate the usefulness of our liveness property, we also present 
an algorithm that can be efficiently implemented to dynamically find las- 
sos in the target program’s state space during Symbolic Execution. This 
extends Symbolic Execution, a well known dynamic testing technique, 
to find a new class of program defects, namely liveness violations, while 
only incurring a small runtime and memory overhead, as evidenced by 
our evaluation. The implementation of our method found a total of five 
previously undiscovered software defects in BusyBox and the GNU Core- 
utils. All five defects have been confirmed and fixed by the respective 
maintainers after shipping for years, most of them well over a decade. 


Keywords: Liveness analysis - Symbolic Execution - Software testing 
Non-termination bugs 


1 Introduction 


Advances in formal testing and verification methods, such as Symbolic Execution 
[10-12, 22-24, 42,49] and Model Checking [5,6,13,17,21,27,29,30,43, 50], have 
enabled the practical analysis of real-world software. Many of these approaches 
are based on the formal specification of temporal system properties using sets of 
infinite sequences of states [1], which can be classified as either safety, liveness, or 
properties that are neither [31]. (However, every linear-time property can be rep- 
resented as the conjunction of a safety and a liveness property.) This distinction 
is motivated by the different techniques employed for proving or disproving such 
© The Author(s) 2018 
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properties. In practical applications, safety properties are prevalent. They con- 
strain the finite behavior of a system, ensuring that “nothing bad” happens, and 
can therefore be checked by reachability analysis. Hence, efficient algorithms 
and tools have been devised for checking such properties that return a finite 
counterexample in case of a violation [34]. 

Liveness properties, on the other hand, do not rule out any finite behavior 
but constrain infinite behavior to eventually do “something good” [2]. Their 
checking generally requires more sophisticated algorithms since they must be 
able to generate (finite representations of) infinite counterexamples. Moreover, 
common finite-state abstractions that are often employed for checking safety do 
generally not preserve liveness properties. 

While it may be easy to create a domain-specific liveness property (e.g., “a 
GET/HTTP/1.1 must eventually be answered with an HTTP/1.1 {status}”), it 
is much harder to formulate general liveness properties. We tackle this challenge 
by proposing a liveness property based on the notion of programs as implemen- 
tations of algorithms that transform input into output: 


Definition 1. A program is live if and only if it always eventually consumes 
input or terminates. 


By relying on input instead of output as the measure of progress, we circumnavi- 
gate difficulties caused by many common programming patterns such as printing 
status messages or logging the current state. 


Detection. We present an algorithm to detect violations of this liveness property 
based on a straightforward idea: Execute the program and check after each 
instruction if the whole program state has been encountered before (identical 
contents of all registers and addressable memory). If a repetition is found that 
does not consume input, it is deterministic and will keep recurring ad infinitum. 
To facilitate checking real-world programs, we perform the search for such lassos 
in the program’s state space while executing it symbolically. 


Examples. Some examples that show the generality of this liveness property 
are: 1. Programs that operate on input from files and streams, such as cat, 
sha256sum or tail. This kind of program is intended to continue running as 
long as input is available. In some cases this input may be infinite (e.g., cat -). 
2. Reactive programs, such as calc.exe or nginx wait for events to occur. Once 
an event occurs, a burst of activity computes an answer, before the software 
goes back to waiting for the next event. Often, an event can be sent to signal a 
termination request. Such events are input just as much as the contents of a file 
read by the program are input. 

In rare cases, a program can intuitively be considered live without satisfying 
our liveness property. Most prominent is the yes utility, which will loop forever, 
only printing output. According to our experience the set of useful programs 
that intentionally allow for an infinite trace consuming only finite input is very 
small and the violation of our liveness property can, in such cases, easily be 
recognized as intentional. Our evaluation supports this claim (cf. Sect. 6). 
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Bugs and Violations. The implementation of our algorithm detected a total of 
five unintended and previously unknown liveness violations in the GNU Coreutils 
and BusyBox, all of which have been in the respective codebases for at least 
7 to 19 years. All five bugs have been confirmed and fixed within days. The 
three implementations of yes we tested as part of our evaluation, were correctly 
detected to not be live. We also automatically generated liveness violating input 
programs for all sed interpreters. 


1.1 Key Contributions 
This paper presents four key contributions: 


1. The definition of a generic liveness property for real-world software. 

2. An algorithm to detect its violations. 

3. An open-source implementation of the algorithm, available on GitHub!, 
implemented as an extension to the Symbolic Execution engine KLEE [10]. 

4. An evaluation of the above implementation on a total of 354 tools from the 
GNU Coreutils, BusyBox and toybox, which so far detects five previously 
unknown defects in widely deployed real-world software. 


1.2 Structure 


We discuss related work (Sect. 2), before formally defining our liveness property 
(Sect. 3). Then, we describe the lasso detection algorithm (Sect. 4), demonstrate 
the practical applicability by implementing the algorithm for the SymEx engine 
KLEE (Sect. 5) and evaluate it on three real-world software suites (Sect.6). We 
finally discuss the practical limitations (Sect. 7) and conclude (Sect. 8). 


2 Related Work 


General liveness properties [2] can be verified by proof-based methods [40], which 
generally require heavy user support. Contrarily, our work is based upon the 
state-exploration approach to verification. Another prominent approach to verify 
the correctness of a system with respect to its specification is automatic Model 
Checking using automata or tableau based methods [5]. 

In order to combat state-space explosion, many optimization techniques have 
been developed. Most of these, however, are only applicable to safety properties. 
For example, Bounded Model Checking (BMC) of software is a well-established 
method for detecting bugs and runtime errors [7,18,19] that is implemented by 
a number of tools [16,38]. These tools investigate finite paths in programs by 
bounding the number of loop iterations and the depth of function calls, which is 
not necessarily suited to detect the sort of liveness violations we aim to discover. 
There is work trying to establish completeness thresholds of BMC for (safety 
and) liveness properties [33], but these are useful only for comparatively small 


1 https: //github.com/COMSYS/SymbolicLivenessAnalysis. 
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systems. Moreover, most BMC techniques are based on boolean SAT, instead of 
SMT, as required for dealing with the intricacies of real-world software. 

Termination is closely related to liveness in our sense, and has been inten- 
sively studied. It boils down to showing the well-foundedness of the program’s 
transition relation by identifying an appropriate ranking function. In recent 
works, this is accomplished by first synthesizing conditional termination proofs 
for program fragments such as loops, and then combining sub-proofs using a 
transformation that isolates program states for which termination has not been 
proven yet [8]. A common assumption in this setting is that program variables 
are mathematical integers, which eases reasoning but is generally unsound. A 
notable exception is AProVE [28], an automated tool for termination and com- 
plexity analysis that takes (amongst others) LLVM intermediate code and builds 
a SymEx graph that combines SymEx and state-space abstraction, covering both 
byte-accurate pointer arithmetic and bit-precise modeling of integers. However, 
advanced liveness properties, floating point values, complex data structures and 
recursive procedures are unsupported. While a termination proof is a witness for 
our liveness property, an infinite program execution constitutes neither witness 
nor violation. Therefore, non-termination proof generators, such as TNT [26], 
while still related, are not relevant to our liveness property. 

The authors of Bolt [32] present an entirely different approach, by proposing 
an in-vivo analysis and correction method. Bolt does not aim to prove that a 
system terminates or not, but rather provides a means to force already running 
binaries out of a long-running or infinite loop. To this end, Bolt can attach to an 
unprepared, running program and will detect loops through memory snapshot- 
ting, comparing snapshots to a list of previous snapshots. A user may then choose 
to forcefully break the loop by applying one of two strategies as a last-resort 
option. Previous research into in-vivo analysis of hanging systems attempts to 
prove that a given process has run into an infinite loop [9]. Similarly to Bolt, 
Looper also attaches to a binary but then uses Concolic Execution (ConEx) to 
gain insight into the remaining, possible memory changes for the process. This 
allows for a diagnosis of whether the process is still making progress and will 
eventually terminate. Both approaches are primarily aimed at understanding or 
handling an apparent hang, not for proactively searching for unknown defects. 

In [35], the authors argue that non-termination has been researched signifi- 
cantly less than termination. Similar to [14,25], they employ static analysis to 
find every Strongly Connected SubGraph (SCSG) in the Control Flow Graph 
(CFG) of a given program. Here, a Max-SMT solver is used to synthesize a for- 
mulaic representation of each node, which is both a quasi-invariant (i.e., always 
holding after it held once) and edge-closing (i.e., not allowing a transition that 
leaves the node’s SCSG to be taken). If the solver succeeds for each node in a 
reachable SCSG, a non-terminating path has been found. 

In summary, the applicability of efficient methods for checking liveness in 
our setting is hampered by restrictions arising from the programming model, the 
supported properties (e.g., only termination), scalability issues, missing support 
for non-terminating behavior or false positives due to over-approximation. In the 
following, we present our own solution to liveness checking of real-world software. 
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3 Liveness 


We begin by formally defining our liveness property following the approach by 
Alpern and Schneider [1-3], which relies on the view that liveness properties do 
not constrain the finite behaviors but introduce conditions on infinite behaviors. 
Here, possible behaviors are given by (edge-labeled) transition systems. 


Definition 2 (Transition System). A transition system T is a 4-tuple 
(S, Act, 3,1): 


— S$ is a finite set of states, 

— Act is a finite set of actions, 

~— CS Act x S is a transition relation (written s S s'), and 
—~ ICS is the set of initial states. 


For s € S, the sets of outgoing actions is denoted by Out(s) = {a € Act | 
s & s! for some s! € S}. Moreover, we require T to be deadlock free, i.e., 
Out(s) Æ Ú for each s € S. A terminal state is indicated by a self-loop involving 
the distinguished action | € Act: if | E€ Out(s), then Out(s) = {|}. 


The self-loops ensure that all executions of a program are infinite, which is 
necessary as terminal states indicate successful completion in our setting. 


Definition 3 (Executions and Traces). An (infinite) execution is a sequence 
of the form soa 1810282... such that so E€ I and si aa Si41 for every i E€ N. 
Its trace is given by ayag... € Act”. 


Definition 4 (Liveness Properties) 


— A linear-time property over Act is a subset of Act”. 
— Let II C Act be a set of productive actions such that | € IT. The I-liveness 
property is given by {aiag...€ Act” | a; € H for infinitely many i € N}. 


A liveness property is generally characterized by the requirement that each 
finite trace prefix can be extended to an infinite trace that satisfies this property. 
In our setting, this means that in each state of a given program it is guaranteed 
that eventually a productive action will be performed. That is, infinitely many 
productive actions will occur during each execution. As | is considered produc- 
tive, terminating computations are live. This differs from the classical setting 
where terminal states are usually considered as deadlocks that violate liveness. 

We assume that the target machine is deterministic w.r.t. its computations 
and model the consumption of input as the only source of non-determinism. This 
means that if the execution is in a state in which the program will execute a non- 
input instruction, only a single outgoing (unproductive) transition exists. If the 
program is to consume input on the other hand, a (productive) transition exists 
for every possible value of input. We only consider functions that provide at least 
one bit of input as input functions, which makes | the only productive action 
that is also deterministic, that is, the only productive transition which must be 
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taken once the state it originates from is reached. More formally, |Out(s)| > 
1 = Out(s) C H \ {|}. Thus if a (sub-)execution s;a;415;41... contains no 
productive transitions beyond |, it is fully specified by its first state s;, as there 
will only ever be a single transition to be taken. 

Similarly, we assume that the target machine has finite memory. This implies 
that the number of possible states is finite: |S| € N. Although we model each 
possible input with its own transition, input words are finite too, therefore Act 
is finite and hence Out(s) for each s € S. 


4 Finding Lassos 


Any trace t that violates a liveness property must necessarily consist of a finite 
prefix p that leads to some state s € S, after which no further productive transi- 
tions are taken. Therefore, t can be written as t = pq, where p is finite and may 
contain productive actions, while q is infinite and does not contain productive 
actions. Since S is a finite set and every state from s onward will only have 
a single outgoing transition and successor, q must contain a cycle that repeats 
itself infinitely often. Therefore, q in turn can be written as q = fc’ where f is 
finite and c non-empty. Due to its shape, we call this a lasso with pf the stem 
and c the loop. 

Due to the infeasible computational complexity of checking our liveness prop- 
erty statically (in the absence of input functions, it becomes the finite-space 
halting problem), we leverage a dynamic analysis that is capable of finding any 
violation in bounded time and works incrementally to report violations as they 
are encountered. We do so by searching the state space for a lasso, whose loop 
does not contain any productive transitions. This is naively achieved in the 
dynamic analysis by checking whether any other state visited since the last pro- 
ductive transition is equal to the current one. In this case the current state 
deterministically transitions to itself, i.e., is part of the loop. 

To implement this idea without prohibitively expensive resource usage, two 
main challenges must be overcome: 1. Exhaustive exploration of all possible 
inputs is infeasible for nontrivial cases. 2. Comparing states requires up to 264 
byte comparisons on a 64 bit computer. In the rest of this section, we discuss how 
to leverage SymEx to tackle the first problem (Sect. 4.1) and how to cheapen 
state comparisons with specially composed hash-based fingerprints (Sect. 4.2). 


4.1 Symbolic Execution 


Symbolic Execution (SymEx) has become a popular dynamic analysis technique 
whose primary domain is automated test case generation and bug detection 
(10-12, 15,22,41,42,49]. The primary intent behind SymEx is to improve upon 
exhaustive testing by symbolically constraining inputs instead of iterating over 
all possible values, which makes it a natural fit. 


Background. The example in Fig. 1 tests whether the variable x is in the range 
from 5 to 99 by performing two tests before returning the result. As x is the 
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{} @bool ok = true; 
@if(x < 5) 


© ok = false; 


{x>5}  |@if(x >= 100) 
= © 


ok = false; 
@return ok; 


if(x >= 100) 
{x >5,x > 100} {x > 5,2 < 100} 


( return ok; ) return ok; 


Fig. 1. SymEx tree showing the execution of a snippet with two ifs. The variable x is 
symbolic and one state is unreachable, as its Path Constraint is unsatisfiable. 


input to this snippet, it is initially assigned an unconstrained symbolic value. 
Upon branching on x < 5 in line 2, the SymEx engine needs to consider two 
cases: One in which x is now constrained to be smaller than 5 and another one 
in which it is constrained to not be smaller than 5. On the path on which x < 5 
held, ok is then assigned false, while the other path does not execute that 
instruction. Afterwards, both paths encounter the branch if(x > = 100) in line 
4. Since the constraint set {x < 5,2 > 100} is unsatisfiable, the leftmost of the 
four resulting possibilities is unreachable and therefore not explored. The three 
remaining paths reach the return statement in line 6. We call the set of currently 
active constraints the Path Constraint (PC). The PC is usually constructed in 
such a way, as to contain constraints in the combined theories of quantifier-free 
bit-vectors, finite arrays and floating point numbers’. 


Symbolic Execution of the Abstract Transition System. By using sym- 
bolic values, a single SymEx state can represent a large number of states in the 
transition system. We require that the SymEx engine, as is commonly done, 
never assigns a symbolic value (with more than one satisfying model) to the 
instruction pointer. Since the productive transitions of the transition system are 
derived from instructions in the program code, this means that each instruction 
that the SymEx engine performs either corresponds to a number of productive, 
input-consuming transitions, or a number of unproductive, not input-consuming 
transitions. Therefore, any lasso in the SymEx of the program is also a lasso in 
the transition system (the | transition requires trivial special treatment). 

To ensure that the opposite is also true, a simple and common optimization 
must be implemented in the SymEx engine: Only add branch conditions to the 
PC that are not already implied by it. This is the case iff exactly one of the 
two branching possibilities is satisfiable, which the SymEx engine (or rather its 
SMT solver) needs to check in any case. Thereby it is guaranteed that if the 
SymEx state is part of a loop in the transition system, not just the concrete 


2 While current SymEx engines and SMT solvers still struggle with the floating point 
theory in practice [37], the SMT problem is decidable for this combination of theories. 
Bitblasting [20] gives a polynomial-time reduction to the boolean SAT problem. 
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values, but also the symbolic values will eventually converge towards a steady 
state. Again excluding trivial special treatment for program termination, a lasso 
in the transition system thus entails a lasso in the SymEx of the program. 


4.2 Fingerprinting 


To reduce the cost of each individual comparison between two states, we take 
an idea from hash maps by computing a fingerprint p for each state and com- 
paring those. A further significant improvement is possible by using a strong 
cryptographic hash algorithm to compute the fingerprint: Being able to rely 
(with very high probability) on the fingerprint comparison reduces the memory 
requirements, as it becomes unnecessary to store a list of full predecessor states. 
Instead, only the fingerprints of the predecessors need to be kept. 
Recomputing the fingerprint after each instruction would still require a full 
scan over the whole state at each instruction however. Instead, we enable effi- 
cient, incremental computation of the fingerprint by not hashing everything, but 
rather hashing many small fragments, and then composing the resulting hashes 
using bitwise xor. Then, if an instruction attempts to modify a fragment f, it is 
easy to compute the old and new fragment hashes. The new fingerprint Pnew can 
then be computed as Pnew ‘= Pola ® hash( foia) ® hash( fnew). Changing a single 
fragment therefore requires only two computations and bitwise xors on constant 
size bit strings—one to remove the old fragment from the composite and one to 
insert the new one. Each incremental fingerprint update only modifies a small 
number of fragments statically bounded by the types used in the program. 


4.3 Algorithm Overview 


The proposed algorithm explores as much of the input state as is possible within 
a specified amount of time, using SymEx to cover large portions of the input 
space simultaneously. Every SymEx state is efficiently checked against all its 
predecessors by comparing their fingerprints. 


5 Efficient Implementation of the Algorithm 


To develop the algorithm presented in the previous section into a practically 
useful program, we decided to build upon the KLEE SymEx engine [10], with 
which many safety bugs in real-world programs have been previously found 
[10,15,41]. As KLEE in turn builds upon the LLVM compiler infrastructure 
[36], this section begins with a short introduction to LLVM Intermediate Rep- 
resentation (IR) (Sect.5.1), before explaining how the fragments whose hashes 
make up the fingerprint can be implemented (Sect. 5.2) and how to track finger- 
prints (Sect. 5.3). Finally, we detail a technique to avoid as many comparisons 
as possible (Sect. 5.4). 
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5.1 LLVM Intermediate Representation 


LLVM Intermediate Representation (IR) was designed as a typed, low-level lan- 
guage independent from both (high-level) source language and any specific tar- 
get architecture, to facilitate compiler optimizations. It operates on an unlim- 
ited number of typed registers of arbitrary size, as well as addressable memory. 
Instructions in IR operate in Static Single Assignment (SSA) form, i.e., registers 
are only ever assigned once and never modified. The language also has functions, 
which have a return type and an arbitrary number of typed parameters. Apart 
from global scope, there is only function scope, but IR features no block scope. 

Addressable objects are either global variables, or explicitly allocated, e.g., 
using malloc (cleaned up with free) or alloca (cleaned up on return from 
function). 


Main Memory (Concrete): 
Main Memory (Symbolic): 
Register (Concrete): 
Register (Symbolic): 
Argument (Concrete): 
Argument (Symbolic): 


Fig. 2. Six kinds of fragments suffice to denote all possible variants. Symbolic values 
are written as serialized symbolic expressions consisting of all relevant constraints. All 
other fields only ever contain concrete values, which are simply used verbatim. Fields 
of dynamic size are denoted by a ragged right edge. 


5.2 Fragments 


When determining what is to become a fragment, i.e., an atomic portion of a 
fingerprint, two major design goals should be taken into consideration: 


1. Collisions between hashed fragments should not occur, as they would expunge 
one another from the fingerprint. This goal can be decomposed further: 

(a) The hashing algorithm should be chosen in a manner that makes collisions 
so unlikely, as to be non-existent in practice. 

(b) The fragments themselves need to be generated in a way that ensures that 
no two different fragments have the same representation, as that would 
of course cause their hashes to be equal as well. 

2. Fragment sizes should be as close as possible to what will be modified by the 
program in one step. Longer fragments are more expensive to compute and 
hash, and shorter fragments become invalidated more frequently. 


Avoiding Collisions. In order to minimize the risk of accidental collisions, 
which would reduce the efficacy of our methodology, we chose the cryptographi- 
cally secure checksum algorithm BLAKE2b [4] to generate 256 bit hashes, provid- 
ing 128 bit collision resistance. To the best of our knowledge, there are currently 
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Fig. 3. Incremental computation of a new fingerprint. Fingerprints are stored in a call 
stack, with each stack frame containing a partial fingerprint of all addressable memory 
allocated locally in that function, another partial fingerprint of all registers used in the 
function and a list of previously encountered fingerprints. A partial fingerprint of all 
dynamic and global variables is stored independently. 


no relevant structural attacks on BLAKE2b, which allows us to assume that the 
collision resistance is given. For comparison: The revision control system GIT 
currently uses 160 bit SHA-1 hashes to create unique identifiers for its objects, 
with plans underway to migrate to a stronger 256 bit hash algorithm?. 

To ensure that the fragments themselves are generated in a collision-free 
manner, we structure them with three fields each, as can be seen in Fig. 2. 
The first field contains a tag that lets us distinguish between different types of 
fragments, the middle field contains an address appropriate for that type, and the 
last field is the value that the fragment represents. We distinguish between three 
different address spaces: 1. main memory, 2. LLVM registers, which similarly 
to actual processors hold values that do not have a main memory address, and 
3. function arguments, which behave similarly to ordinary LLVM registers, but 
require a certain amount of special handling in our implementation. For example, 
the fragment (0x01, 0xFF3780, OxFF) means that the memory address 0xFF3780 
holds the concrete byte OxFF. This fragment hashes to ea58. .. £677. 

If the fragment represents a concrete value, its size is statically bounded by 
the kind of write being done. For example, a write to main memory requires 
1 byte + 8 byte + 1 byte = 10 byte and modifying a 64 bit register requires 
1 byte + 8 byte + Snes = 17 byte. In the case of fragments representing 
symbolic values on the other hand, such a guarantee cannot effectively be made, 
as the symbolic expression may become arbitrarily large. Consider, for example, 
a symbolic expression of the form À = input, + input, + ...+ input,,, whose 
result is directly influenced by an arbitrary amount of n input words. 

In summary, fragments are created in a way that precludes structural weak- 
nesses as long as the hash algorithm used (in our case 256 bit BLAKE2b) remains 
unbroken and collisions are significantly less probable than transient failures of 
the computer performing the analysis. 


3 https: //www.kernel.org/pub/software/scm/git /docs/technical/hash-function-trans 
ition.html (Retrieved Jan. 2018). 
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5.3 Fingerprint Tracking 


When using the KLEE SymEx engine, the call stack is not explicitly mapped 
into the program’s address space, but rather directly managed by KLEE itself. 
This enables us to further extend the practical usefulness of our analysis by only 
considering fragments that are directly addressable from each point of the exe- 
cution, which in turn enables the detection of certain non-terminating recursive 
function calls. It also goes well together with the implicit cleanup of all function 
variables when a function returns to its caller. 

To incrementally construct the current fingerprint we utilize a stack that 
follows the current call stack, as is shown exemplary in Fig. 3. Each entry consists 
of three different parts: 1. A (partial) fingerprint over all local registers, i.e., 
objects that are not globally addressable, 2. A (partial) fingerprint over all locally 
allocated objects in main memory and 3. A list of pairs of instruction IDs and 
fingerprints, that denote the states that were encountered previously. 


Modifying Objects. Any instruction modifying an object without reading 
input, such as an addition, is dealt with as explained previously: First, recom- 
pute the hash of the old fragment(s) before the instruction is performed and 
remove it from the current fingerprint. Then, perform the instruction, compute 
the hash of the new fragment(s) and add it to the current fingerprint. 

Similarly modify the appropriate partial fingerprint, e.g., for a load the fin- 
gerprint of all local registers of the current function. Note that this requires each 
memory object to be mappable to where it was allocated from. 


Function Calls. To perform a function call, push a new entry onto the stack 
with the register fingerprint initialized to the xor of the hashes of the argument 
fragments and the main memory fingerprint set to the neutral element, zero. 
Update the current fingerprint by removing the caller’s register fingerprint and 
adding the callee’s register fingerprint. Add the pair of entry point and current 
fingerprint to the list of previously seen fingerprints. 


Function Returns. When returning from a function, first remove both the 
fingerprint of the local registers, as well as the fingerprint of local, globally 
addressable objects from the current fingerprint, as all of these will be implicitly 
destroyed by the returning function. Then pop the topmost entry from the stack 
and re-enable the fingerprint of the local registers of the caller. 


Reading Input. Upon reading input all previously encountered fingerprints 
must be disregarded by clearing all fingerprint lists of the current SymEx state. 


5.4 Avoiding Comparisons 


While it would be sufficient to simply check all previous fingerprints for a repeti- 
tion every time the current fingerprint is modified, it would be rather inefficient 
to do so. To gain as much performance as possible, our implementation attempts 
to perform as few comparisons as possible. 
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We reduce the number of fingerprints that need to be considered at any point 
by exploiting the structure of the call stack: To find any non-recursive infinite 
loop, it suffices to search the list of the current stack frame, while recursive 
infinite loops can be identified using only the first fingerprint of each stack frame. 

We also exploit static control flow information by only storing and testing 
fingerprints for Basic Blocks (BBs), which are sequences of instructions with 
linear control flow*. If any one instruction of a BB is executed infinitely often, 
all of them are. Thus, a BB is either fully in the infinite cycle, or no part of it is. 

It is not even necessary to consider every single BB, as we are looking for a 
trace with a finite prefix leading into a cycle. As the abstract transition system is 
an unfolding of the CFG, any cycle in the transition system must unfold from a 
cycle in the CFG. Any reachable cycle in the CFG must contain a BB with more 
than one predecessor, as at least one BB must be reachable from both outside 
and inside the cycle. Therefore, it is sufficient to only check BBs with multiple 
predecessors. As IR only provides intraprocedural CFGs, we additionally perform 
a check for infinite recursion at the beginning of each function. 


6 Evaluation 


In this section we demonstrate the effectiveness and performance of our app- 
roach on well tested and widely used real-world software. We focus on three 
different groups of programs: 1. The GNU Coreutils and GNU sed (Sect. 6.1), 
2. BusyBox (Sect. 6.2) and 3. Toybox (Sect. 6.3) and evaluate the performance 
of our liveness analysis in comparison with baseline KLEE in the following met- 
rics: 1. instructions per second and 2. peak resident set size. Additionally, we 
analyze the impact of the time limit on the overhead (Sect.6.4). We summarize 
our findings in Sect. 6.5. 


Setup. We used revision aa01f83 of our software, which is based on KLEE 
revision 37£554d°. Both versions are invoked as suggested by the KLEE authors 
and maintainers [10,47] in order to maximize reproducability and ensure realis- 
tic results. However, we choose the Z3 [39] solver over STP [20] as the former 
provides a native timeout feature, enabling more reliable measurements. The 
solver timeout is 30 s and the memory limit is 10 000 MiB. 

We run each configuration 20 times in order to gain statistical confidence in 
the results. From every single run, we extract both the instructions, allowing us 
to compute the instructions per second, and the peak resident set size of the pro- 
cess, i.e., the maximal amount of memory used. We additionally reproduced the 
detected liveness violations with 30 runs each with a time limit of 24h, recording 
the total time required for our implementation to find the first violation. For all 
results we give a 99% confidence interval. 


4 In IR there is an exemption for function calls, namely they do not break up BBs. 
5 https://github.com/COMSYS/SymbolicLivenessAnalysis/tree/aa01f83. 
6 https: //github.com/klee/klee/tree/37f554d. 


Symbolic Liveness Analysis of Real-World Software 459 


6.1 GNU Utilities 


We combine the GNU tools from the Coreutils 8.25 [45] with GNU sed 4.4 [46], 
as the other tool suites also contain an implementation of the sed utility. We 
excluded 4 tools from the experiment as their execution is not captured by 
KLEE’s system model. Thereby, the experiment contains a total of 103 tools. 


Violations. The expected liveness violation in yes occurred after 2.51 s+ 0.26 s. 
In 26 out of 30 runs, we were also able to detect a violation in GNU sed after 
a mean computation time of 8.06 h + 3.21 h (KLEE’s timeout was set to 24h). 
With the symbolic arguments restricted to one argument of four symbolic charac- 
ters, reproduction completed in all 30 runs with a mean of 5.19 min + 0.17 min. 
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Fig. 4. GNU Coreutils and GNU sed, 60 min time limit. Relative change of instructions 
per second (top) and peak resident set (bottom) versus the KLEE baseline. Note the 
logarithmic scale and the black 99% confidence intervals. 
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Fig. 5. BusyBox, 60 min time limit. Relative change of instructions per second (top) 
and peak resident set (bottom) versus the KLEE baseline. Note the logarithmic scale 
and the black 99% confidence intervals. 


We detected multiple violations in tail stemming from two previously 
unknown bugs, that we reported. Both bugs were originally detected and 
reported in version 8.257 and fixed in version 8.26. Both bugs were in the code- 
base for over 16 years. Reproducing the detection was successful in 30 of 30 
attempts with a mean time of 1.59 h + 0.66 h until the first detected violation. 

We detected another previously unknown bug in ptx. Although we originally 
identified the bug in version 8.27, we reported it after the release of 8.28°, leading 


T GNU tail report 1: http://bugs.gnu.org/24495. 
GNU tail report 2: http://bugs.gnu.org/24903. 
8 GNU ptx report: http://bugs.gnu.org/28417. 
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to a fix in version 8.29. This bug is not easily detected: Only 9 of 30 runs 
completed within the time limit of 24h. For these, mean time to first detection 
was 17.15 h + 3.74 h. 


Performance. Figure4 shows the relative changes in instructions per second 
and peak resident set. As can be seen, performance is only reduced slightly 
below the KLEE baseline and the memory overhead is even less significant. The 
leftmost tool, make-prime-list, shows the by far most significant change from 
the KLEE baseline. This is because make-prime-list only reads very little 
input, followed by a very complex computation in the course of which no further 
input is read. 


6.2 BusyBox 


For this experiment we used BusyBox version 1.27.2 [44]. As BusyBox contains 
a large number of network tools and daemons, we had to exclude 232 tools from 
the evaluation, leaving us with 151 tools. 


Violations. Compared with Coreutils’ yes, detecting the expected liveness vio- 
lation in the BusyBox implementation of yes took comparatively long with 
27.68 s + 0.33 s. We were unable to detect any violations in BusyBox sed with- 
out restricting the size of the symbolic arguments. When restricting them to one 
argument with four symbolic characters, we found the first violation in all 30 
runs within 1.44 h + 0.08 h. Our evaluation uncovered two previously unknown 
bugs in BusyBox hush’. We first detected both bugs in version 1.27.2. In all 30 
runs, a violation was detected after 71.73 s + 5.00 s. 


Performance. As shown in Fig. 5, BusyBox has a higher slowdown on average 
than the GNU Coreutils (c.f. Fig. 4). Several tools show a decrease in memory 
consumption that we attribute to the drop in retired instructions. yes shows the 
least throughput, as baseline KLEE very efficiently evaluates the infinite loop. 
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Fig. 6. Toybox, 60 min time limit. Relative change of instructions per second (top) 
and peak resident set (bottom) versus the KLEE baseline. Note the logarithmic scale 
and the black 99% confidence intervals. 


° BusyBox hush report 1: https://bugs.busybox.net/10421. 
BusyBox hush report 2: https://bugs.busybox.net /10686. 
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6.3 Toybox 


The third and final experiment with real-world software consists of 100 tools 
from toybox 0.7.5 [48]. We excluded 76 of the total of 176 tools, which rely on 
operating system features not reasonably modeled by KLEE. 


Violations. For yes we encounter the first violation after 6.34 s + 0.24 s, which 
puts it in between the times for GNU yes and BusyBox yes. This violation is 
also triggered from env by way of toybox’s internal path lookup. As with the 
other sed implementations, toybox sed often fails to complete when run with 
the default parameter set. With only one symbolic argument of four symbolic 
characters, however, we encountered a violation in all 30 runs within 4.99 min + 
0.25 min. 


Performance. Overall as well, our approach shows a performance for toybox in 
between those for the GNU Coreutils and BusyBox, as can be seen in Fig. 6. Both 
memory and velocity overhead are limited. For most toybox tools, the overhead 
is small enough to warrant always enabling our changes when running KLEE. 
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Fig. 7. Changes in instructions per second, peak res- Fig. 8. Heap usage of a 30 min 
ident set and branch coverage over multiple KLEE BusyBox hush run. The 186 
timeouts. Note the logarithmic scale and the black vertical lines show detected 
99% confidence intervals. liveness violations. 


6.4 Scaling with the Time Limit 


To ascertain whether the performance penalty incurred by our implementation 
scales with the KLEE time limit, we have repeated each experiment with time 
limits 15 min, 30 min and 60 min. The results shown in Fig. 7 indicate that, at 
least at this scale, baseline KLEE and our implementation scale equally well. 
This is true for almost all relevant metrics: retired instructions per second, peak 
resident set and covered branches. The prominent exception is BusyBox’s mem- 
ory usage, which is shown exemplary in Fig. 8 for a 30 min run of BusyBox hush. 
As can be seen, the overhead introduced by the liveness analysis is mostly stable 
at about a quarter of the total heap usage. 


462 D. Schemmel et al. 


6.5 Summary 


All evaluated tool suites show a low average performance and memory penalty 
when comparing our approach to baseline KLEE. While the slowdown is signif- 
icant for some tools in each suite, it is consistent as long as time and memory 
limits are not chosen too tightly. In fact, for these kinds of programs, it is rea- 
sonable to accept a limited slowdown in exchange for opening up a whole new 
category of defects that can be detected. In direct comparison, performance 
varies in between suites, but remains reasonable in each case. 


7 Limitations 


Our approach does not distinguish between interpreters and interpreted pro- 
grams. While this enables the automatic derivation of input programs for such 
interpreters as sed, it also makes it hard to recognize meaningful error cases. This 
causes the analysis of all three implementations of sed used in the evaluation 
(Sect. 6) to return liveness violations. 

In its current form, our implementation struggles with runaway counters, as a 
64 bit counter cannot be practically enumerated on current hardware. Combining 
static analyses, such as those done by optimizing compilers may significantly 
reduce the impact of this problem in the future. 

A different pattern that may confound our implementation is related to 
repeated allocations. If memory is requested again after releasing it, the newly 
acquired memory may not be at the same position, which causes any pointers 
to it to have different values. While this is fully correct, it may cause the imple- 
mentation to not recognize cycles in a reasonable time frame. This could be 
mitigated by analyzing whether the value of the pointer ever actually matters. 
For example, in the C programming language, it is fairly uncommon to inspect 
the numerical value of a pointer beyond comparing it to NULL or other pointers. 
A valid solution would however require strengthening KLEE’s memory model, 
which currently does not model pointer inspection very well. 

Another potential problem is how the PC is serialized when using symbolic 
expressions as the value of a fragment (c.f. Sect. 5.2). We currently reuse KLEE’s 
serialization routines, which are not exactly tuned for performance. Also, each 
symbolic value that is generated by KLEE is assigned a unique name, that is 
then displayed by the serialization, which discounts potential equivalence. 

Finally, by building upon SymEx, we inherit not only its strengths, but also 
its weaknesses, such as a certain predilection for state explosion and a reliance 
on repeated SMT solving [12]. Also, actual SymEx implementations are limited 
further than that. For example, KLEE returns a concrete pointer from allocation 
routines instead of a symbolic value representing all possible addresses. 


8 Conclusion and Outlook 


It is our strong belief that the testing and verification of liveness properties 
needs to become more attractive to developers of real-world programs. Our work 
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provides a step in that direction with the formulation of a liveness property that 
is general and practically useful, thereby enabling even developers uncomfortable 
with interacting with formal testing and verification methods to at least check 
their software for liveness violation bugs. 

We demonstrated the usefulness of our liveness property by implementing it 
as an extension to the Symbolic Execution engine KLEE, thereby enabling it to 
discover a class of software defects it could not previously detect, and analyzing 
several large and well-tested programs. Our implementation caused the discovery 
and eventual correction of a total of five previously unknown defects, three in 
the GNU Coreutils, arguably one of the most well-tested code bases in existence, 
and two in BusyBox. Each of these bugs had been in released software for over 7 
years—four of them even for over 16 years, which goes to show that this class of 
bugs has so far proven elusive. Our implementation did not cause a single false 
positive: all reported violations are indeed accompanied by concrete test cases 
that reproduce a violation of our liveness property. 

The evaluation in Sect. 6 also showed that the performance impact, in matters 
of throughput as well as in matters of memory consumption, remains significantly 
below 2x on average, while allowing the analysis to detect a completely new 
range of software defects. We demonstrated that this overhead remains stable 
over a range of different analysis durations. 

In future work, we will explore the opportunities for same-state merging that 
our approach enables by implementing efficient equality testing of SymEx states 
via our fingerprinting scheme. We expect that this will further improve the per- 
formance of our approach and maybe even exceed KLEE’s baseline performance 
by reducing the amount of duplicate work done. 
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Abstract. This paper describes our experience with symbolic model 
checking in an industrial setting. We have proved that the initial boot 
code running in data centers at Amazon Web Services is memory safe, 
an essential step in establishing the security of any data center. Standard 
static analysis tools cannot be easily used on boot code without modifica- 
tion owing to issues not commonly found in higher-level code, including 
memory-mapped device interfaces, byte-level memory access, and linker 
scripts. This paper describes automated solutions to these issues and 
their implementation in the C Bounded Model Checker (CBMC). CBMC 
is now the first source-level static analysis tool to extract the memory 
layout described in a linker script for use in its analysis. 


1 Introduction 


Boot code is the first code to run in a data center; thus, the security of a data 
center depends on the security of the boot code. It is hard to demonstrate boot 
code security using standard techniques, as boot code is difficult to test and 
debug, and boot code must run without the support of common security miti- 
gations available to the operating system and user applications. This industrial 
experience report describes work to prove the memory safety of initial boot code 
running in data centers at Amazon Web Services (AWS). 

We describe the challenges we faced analyzing AWS boot code, some of which 
render existing approaches to software verification unsound or imprecise. These 
challenges include 


memory-mapped input/output (MMIO) for accessing devices, 

device behavior behind these MMIO regions, 

byte-level memory access as the dominant form of memory access, and 
linker scripts used during the build process. 


AUNE 


Not handling MMIO or linker scripts results in imprecision (false positives), and 
not modeling device behavior is unsound (false negatives). 

We describe the solutions to these challenges that we developed. We imple- 
mented our solutions in the C Bounded Model Checker (CBMC) [20]. We achieve 
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soundness with CBMC by fully unrolling loops in the boot code. Our solutions 
automate boot code verification and require no changes to the code being ana- 
lyzed. This makes our work particularly well-suited for deployment in a continu- 
ous validation environment to ensure that memory safety issues do not reappear 
in the code as it evolves during development. We use CBMC, but any other 
bit-precise, sound, automated static analysis tool could be used. 


2 Related Work 


There are many approaches to finding memory safety errors in low-level code, 
from fuzzing [2] to static analysis [24,30,39,52] to deductive verification [21,34]. 

A key aspect of our work is soundness and precision in the presence of very 
low-level details. Furthermore, full automation is essential in our setting to oper- 
ate in a continuous validation environment. This makes some form of model 
checking most appealing. 

CBMC is a bounded model checker for C, C++, and Java programs, available 
on GitHub [13]. It features bit-precise reasoning, and it verifies array bounds 
(buffer overflows), pointer safety, arithmetic exceptions, and assertions in the 
code. A user can bound the model checking done by CBMC by specifying for a 
loop a maximum number of iterations of the loop. CBMC can check that it is 
impossible for the loop to iterate more than the specified number of times by 
checking a loop-unwinding assertion. CBMC is sound when all loop-unwinding 
assertions hold. Loops in boot code typically iterate over arrays of known sizes, 
making it possible to choose loop unwinding limits such that all loop-unwinding 
assertions hold (see Sect. 5.7). BLITZ [16] or F-Soft [36] could be used in place 
of CBMC. SATABS [19], Ufo [3], Cascade [55], Blast [9], CPAchecker [10], Cor- 
ral [33,43,44], and others [18,47] might even enable unbounded verification. Our 
work applies to any sound, bit-precise, automated tool. 

Note that boot code makes heavy use of pointers, bit vectors, and arrays, 
but not the heap. Thus, memory safety proof techniques based on three-valued 
logic [45] or separation logic as in [8] or other techniques [1,22] that focus on the 
heap are less appropriate since boot code mostly uses simple arrays. 

KLEE [12] is a symbolic execution engine for C that has been used to find 
bugs in firmware. Davidson et al. [25] built the tool FIE on top of KLEE for 
detecting bugs in firmware programs for the MSP430 family of microcontrollers 
for low-power platforms, and applied the tool to nearly a hundred open source 
firmware programs for nearly a dozen versions of the microcontroller to find bugs 
like buffer overflow and writing to read-only memory. Corin and Manzano [23] 
used KLEE to do taint analysis and prove confidentiality and integrity proper- 
ties. KLEE and other tools like SMACK [49] based on the LLVM intermediate 
representation do not currently support the linker scripts that are a crucial part 
of building boot code (see Sect. 4.5). They support partial linking by concatenat- 
ing object files and resolving symbols, but fail to make available to their analysis 
the addresses and constants assigned to symbols in linker scripts, resulting in an 
imprecise analysis of the code. 
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S?E [15] is a symbolic execution engine for x86 binaries built on top of the 
QEMU [7] virtual machine and KLEE. $?E has been used on firmware. Parvez 
et al. [48] use symbolic execution to generate inputs targeting a potentially buggy 
statement for debugging. Kuznetsov et al. [42] used a prototype of S?E to find 
bugs in Microsoft device drivers. Zaddach et al. [56] built the tool Avatar on 
top of S?E to check security of embedded firmware. They test firmware running 
on top of actual hardware, moving device state between the concrete device and 
the symbolic execution. Bazhaniuk et al. [6,28] used S?E to search for security 
vulnerabilities in interrupt handlers for System Management Mode on Intel plat- 
forms. Experts can use S?E on firmware. One can model device behavior (see 
Sect. 4.2) by adding a device model to QEMU or using the signaling mechanism 
used by S?E during symbolic execution. One can declare an MMIO region (see 
Sect. 4.1) by inserting it into the QEMU memory hierarchy. Both require under- 
standing either QEMU or S$?E implementations. Our goal is to make it as easy 
as possible to use our work, primarily by way of automation. 

Ferreira et al. [29] verify a task scheduler for an operating system, but that 
is high in the software stack. Klein et al. [38] prove the correctness of the seL4 
kernel, but that code was written with the goal of proof. Dillig et al. [26] syn- 
thesize guards ensuring memory safety in low-level code, but our code is written 
by hand. Rakamarié and Hu [50] developed a conservative, scalable approach to 
memory safety in low-level code, but the models there are not tailored to our code 
that routinely accesses memory by an explicit integer-valued memory address. 
Redini et al. [51] built a tool called BootStomp on top of angr [54], a frame- 
work for symbolic execution of binaries based on a symbolic execution engine 
for the VEX intermediate representation for the Valgrind project, resulting in a 
powerful testing tool for boot code, but it is not sound. 


3 Boot Code 


We define boot code to be the code in a cloud data center that runs from the 
moment the power is turned on until the BIOS starts. It runs before the operating 
system’s boot loader that most people are familiar with. A key component to 
ensuring high confidence in data center security is establishing confidence in boot 
code security. Enhancing confidence in boot code security is a challenge because 
of unique properties of boot code not found in higher-level software. We now 
discuss these properties of boot code, and a path to greater confidence in boot 
code security. 


3.1 Boot Code Implementation 


Boot code starts a sequenced boot flow [4] in which each stage locates, loads, 
and launches the next stage. The boot flow in a modern data center proceeds as 
follows: (1) When the power is turned on, before a single instruction is executed, 
the hardware interrogates banks of fuses and hardware registers for configuration 
information that is distributed to various parts of the platform. (2) Boot code 
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starts up to boot a set of microcontrollers that orchestrate bringing up the 
rest of the platform. In a cloud data center, some of these microcontrollers are 
feature-rich cores with their own devices used to support virtualization. (3) The 
BIOS familiar to most people starts up to boot the cores and their devices. 
(4) A boot loader for the hypervisor launches the hypervisor to virtualize those 
cores. (5) A boot loader for the operating system launches the operating system 
itself. The security of each stage, including operating system launched for the 
customer, depends on the integrity of all prior stages [27]. 

Ensuring boot code security using traditional techniques is hard. Visibility 
into code execution can only be achieved via debug ports, with almost no abil- 
ity to single-step the code for debugging. UEFI (Unified Extensible Firmware 
Interface) [53] provides an elaborate infrastructure for debugging BIOS, but not 
for the boot code below BIOS in the software stack. Instrumenting boot code 
may be impossible because it can break the build process: the increased size 
of instrumented code can be larger than the size of the ROM targeted by the 
build process. Extracting the data collected by instrumentation may be difficult 
because the code has no access to a file system to record the data, and memory 
available for storing the data may be limited. 

Static analysis is a relatively new approach to enhancing confidence in boot 
code security. As discussed in Sect. 2, most work applying static analysis to boot 
code applies technology like symbolic execution to binary code, either because 
the work strips the boot code from ROMs on shipping products for analysis 
and reverse engineering [42,51], or because code like UEFI-based implementa- 
tions of BIOS loads modules with a form of dynamic linking that makes source 
code analysis of any significant functionality impossible [6,28]. But with access 
to the source code—source code without the complexity of dynamic linking— 
meaningful static analysis at the source code level is possible. 


3.2 Boot Code Security 


Boot code is a foundational component of data center security: it controls what 
code is run on the server. Attacking boot code is a path to booting your own 
code, installing a persistent root kit, or making the server unbootable. Boot code 
also initializes devices and interfaces directly with them. Attacking boot code 
can also lead to controlling or monitoring peripherals like storage devices. 

The input to boot code is primarily configuration information. The run- 
time behavior of boot code is determined by configuration information in fuses, 
hardware straps, one-time programmable memories, and ROMs. 

From a security perspective, boot code is susceptible to a variety of events 
that could set the configuration to an undesirable state. To keep any malicious 
adversary from modifying this configuration information, the configuration is 
usually locked or otherwise write-protected. Nonetheless, it is routine to dis- 
cover during hardware vetting before placing hardware on a data center floor 
that some BIOS added by a supplier accidentally leaves a configuration register 
unlocked after setting it. In fact, configuration information can be intentionally 
unlocked for the purpose of patching and then be locked again. Any bug in a 
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patch or in a patching mechanism has the potential to leave a server in a vulner- 
able configuration. Perhaps more likely than anything is a simple configuration 
mistake at installation. We want to know that no matter how a configuration 
may have been corrupted, the boot code will operate as intended and without 
latent exposures for potential adversaries. 

The attack surface we focus on in this paper is memory safety, meaning 
there are no buffer overflows, no dereferencing of null pointers, and no pointers 
pointing into unallocated regions of memory. Code written in C is known to 
be at risk for memory safety, and boot code is almost always written in C, in 
part because of the direct connection between boot code and the hardware, and 
sometimes because of space limitations in the ROMs used to store the code. 

There are many techniques for protecting against memory safety errors and 
mitigating their consequences at the higher levels of the software stack. Lan- 
guages other than C are less prone to memory safety errors. Safe libraries can do 
bounds checking for standard library functions. Compiler extensions to compil- 
ers like gcc and clang can help detect buffer overflow when it happens (which is 
different from keeping it from happening). Address space layout randomization 
makes it harder for the adversary to make reliable use of a vulnerability. None of 
these mitigations, however, apply to firmware. Firmware is typically built using 
the tool chain that is provided by the manufacturer of the microcontroller, and 
firmware typically runs before the operating system starts, without the benefit of 
operating system support like a virtual machine or randomized memory layout. 


4 Boot Code Verification Challenges 


Boot code poses challenges to the precision, soundness, and performance of any 
analysis tool. The C standard [35] says, “A volatile declaration may be used to 
describe an object corresponding to an MMIO port” and “what constitutes an 
access to an object that has volatile-qualified type is implementation-defined.” 
Any tool that seeks to verify boot code must provide means to model what the 
C standard calls implementation-defined behavior. Of all such behavior, MMIO 
and device behavior are most relevant to boot code. In this section, we discuss 
these issues and the solutions we have implemented in CBMC. 


4.1 Memory-Mapped I/O 


Boot code accesses a device through memory-mapped input/output (MMIO). 
Registers of the device are mapped to specific locations in memory. Boot code 
reads or writes a register in the device by reading or writing a specific location in 
memory. If boot code wants to set the second bit in a configuration register, and if 
that configuration register is mapped to the byte at location 0x1000 in memory, 
then the boot code sets the second bit of the byte at 0x1000. The problem 
posed by MMIO is that there is no declaration or allocation in the source code 
specifying this location 0x1000 as a valid region of memory. Nevertheless accesses 
within this region are valid memory accesses, and should not be flagged as an 
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out-of-bounds memory reference. This is an example of implementation-defined 
behavior that must be modeled to avoid reporting false positives. 

To facilitate analysis of low-level code, we have added to CBMC a built-in 
function 


__CPROVER_allocated_memory (address, size) 


to mark ranges of memory as valid. Accesses within this region are exempt 
from the out-of-bounds assertion checking that CBMC would normally do. The 
function declares the half-open interval [address, address+size) as valid memory 
that can be read and written. This function can be used anywhere in the source 
code, but is most commonly used in the test harness. (CBMC, like most program 
analysis approaches, uses a test harness to drive the analysis.) 


4.2 Device Behavior 


An MMIO region is an interface to a device. It is unsound to assume that the 
values returned by reading and writing this region of memory follow the seman- 
tics of ordinary read-write memory. Imagine a device that can generate unique 
ids. If the register returning the unique id is mapped to the byte at location 
0x1000, then reading location 0x1000 will return a different value every time, 
even without intervening writes. These side effects have to be modeled. One 
easy approach is to ‘havoc’ the device, meaning that writes are ignored and 
reads return nondeterministic values. This is sound, but may lead to too many 
false positives. We can model the device semantics more precisely, using one of 
the options described below. 

If the device has an API, we havoc the device by making use of a more general 
functionality we have added to CBMC. We have added a command-line option 


--remove-function-body device_access 


to CBMC’s goto-instrument tool. When used, this will drop the implemen- 
tation of the function device_access from compiled object code. If there is 
no other definition of device_access, CBMC will model each invocation of 
device_access as returning an unconstrained value of the appropriate return 
type. Now, to havoc a device with an API that includes a read and write method, 
we can use this command-line option to remove their function bodies, and CBMC 
will model each invocation of read as returning an unconstrained value. 

At link time, if another object file, such as the test harness, provides a second 
definition of device_access, CBMC will use this definition in its place. Thus, to 
model device semantics more precisely, we can provide a device model in the test 
harness by providing implementations of (or approximations for) the methods 
in the API. 

If the device has no API, meaning that the code refers directly to the address 
in the MMIO region for the device without reference to accessor functions, we 
have another method. We have added two function symbols 


__CPROVER_mm_io_r(address, size) 
__CPROVER_mm_io_w(address, size, value) 
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to CBMC to model the reading or writing of an address at a fixed integer address. 
If the test harness provides implementations of these functions, CBMC will use 
these functions to model every read or write of memory. For example, defining 


char __CPROVER_mm_io_r(void *a, unsigned s) { 
if (a == 0x1000) return 2; 
} 


will return the value 2 upon any access at address 0x1000, and return a non- 
deterministic value in all other cases. 

In both cases—with or without an API—we can thus establish sound and, if 
needed, precise analysis about an aspect of implementation-defined behavior. 


4.3 Byte-Level Memory Access 


It is common for boot code to access memory a byte at a time, and to access a 
byte that is not part of any variable or data structure declared in the program 
text. Accessing a byte in an MMIO region is the most common example. Boot 
code typically accesses this byte in memory by computing the address of the 
byte as an integer value, coercing this integer to a pointer, and dereferencing 
this pointer to access that byte. Boot code references memory by this kind of 
explicit address far more frequently than it references memory via some explicitly 
allocated variable or data structure. Any tool analyzing boot code must have a 
method for reasoning efficiently about accessing an arbitrary byte of memory. 

The natural model for memory is as an array of bytes, and CBMC does 
the same. Any decision procedure that has a well-engineered implementation 
of a theory of arrays is likely to do a good job of modeling byte-level memory 
access. We improved CBMC’s decision procedure for arrays to follow the state- 
of-the-art algorithm [17,40]. The key data structure is a weak equivalence graph 
whose vertices correspond to array terms. Given an equality a = b between two 
array terms a and b, add an unlabeled edge between a and b. Given an update 
a{i — v} of an array term a, add an edge labeled i between a and a{i — v}. 
Two array terms a and b are weakly equivalent if there is a path from a to b 
in the graph, and they are equal at all indices except those updated along the 
path. This graph is used to encode constraints on array terms for the solver. For 
simplicity, our implementation generates these constraints eagerly. 


4.4 Memory Copying 


One of the main jobs of any stage of the boot flow is to copy the next stage 
into memory, usually using some variant of memcpy. Any tool analyzing boot 
code must have an efficient model of memcpy. Modeling memcpy as a loop iterating 
through a thousand bytes of memory leads to performance problems during 
program analysis. We added to CBMC an improved model of the memset and 
memcpy library functions. 

Boot code has no access to a C library. In our case, the boot code shipped 
an iterative implementation of memset and memcpy. CBMC’s model of the C 
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library previously also used an iterative model. We replaced this iterative 
model of memset and memcpy with a single array operation that can be han- 
dled efficiently by the decision procedure at the back end. We instructed CBMC 
to replace the boot code implementations with the CBMC model using the 
--remove-function-body command-line option described in Sect. 4.2. 


4.5 Linker Scripts 


Linking is the final stage in the process of transforming source code into an 
executable program. Compilation transforms source files into object files, which 
consist of several sections of related object code. A typical object file contains 
sections for executable code, read-only and read-write program data, debugging 
symbols, and other information. The linker combines several object files into a 
single executable object file, merging similar sections from each of the input files 
into single sections in the output executable. The linker combines and arranges 
the sections according to the directives in a linker script. Linker scripts are 
written in a declarative language [14]. 

The functionality of most programs is not sensitive to the exact layout of the 
executable file; therefore, by default, the linker uses a generic linker script! the 
directives of which are suited to laying out high-level programs. On the other 
hand, low-level code (like boot loaders, kernels, and firmware) must often be 
hard-coded to address particular memory locations, which necessitates the use 
of a custom linker script. 

One use for a linker script is to place selected code into a specialized memory 
region like a tightly-coupled memory unit [5], which is a fast cache into which 
developers can place hot code. Another is device access via memory-mapped I/O 
as discussed in Sects. 4.1 and 4.2. Low-level programs address these hard devices 
by having a variable whose address in memory corresponds to the address that 
the hardware exposes. However, no programming language offers the ability to 
set a variable’s address from the program; the variable must instead be laid out 
at the right place in the object file, using linker script directives. 

While linker scripts are essential to implement the functionality of low-level 
code, their use in higher-level programs is uncommon. Thus, we know of no 
work that considers the role of linker scripts in static program analysis; a recent 
formal treatment of linkers [37] explicitly skips linker scripts. Ensuring that 
static analysis results remain correct in the presence of linker scripts is vital 
to verifying and finding bugs in low-level code; we next describe problems that 
linker scripts can create for static analyses. 


Linker Script Challenges. All variables used in C programs must be defined 
exactly once. Static analyses make use of the values of these variables to decide 
program correctness, provided that the source code of the program and libraries 
used is available. However, linker scripts also define symbols that can be accessed 
as variables from C source code. Since C code never defines these symbols, and 


1 On Linux and macOS, running 1d --verbose displays the default linker script. 
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linker scripts are not written in C, the values of these symbols are unknown to 
a static analyzer that is oblivious to linker scripts. If the correctness of code 
depends on the values of these symbols, it cannot be verified. To make this 
discussion concrete, consider the code in Fig. 1. 


/* main.c */ /* link.ld */ 
#include <string.h> SECTIONS { 
.text : { 
extern char text_start; text_start=.; 
extern char text_size; *(. text) 
extern char scratch_start; } 
text_size=SIZEOF(.text); 
int main() { .scratch : { 
memcpy (&text_start, scratch_start=.; 
&scratch_start, .=.+0x1000; 
(size_t)&text_size); scratch_end=.; 
} } 
} 


Fig. 1. A C program using variables whose addresses are defined in a linker script. 


This example, adapted from the GNU linker manual [14], shows the common 
pattern of copying an entire region of program code from one part of memory to 
another. The linker writes an executable file in accordance with the linker script 
on the right; the expression “.” (period) indicates the current byte offset into 
the executable file. The script directs the linker to generate a code section called 
.text and write the contents of the .text sections from each input file into that 
section; and to create an empty 4 KiB long section called .scratch. The symbols 
text_start and scratch_start are created at the address of the beginning of 
the associated section. Similarly, the symbol text_size is created at the address 
equal to the code size of the .text section. Since these symbols are defined in 
the linker script, they can be freely used from the C program on the left (which 
must declare the symbols as extern, but not define them). While the data at the 
symbols’ locations is likely garbage, the symbols’ addresses are meaningful; in 
the program, the addresses are used to copy data from one section to another. 

Contemporary static analysis tools fail to correctly model the behavior of this 
program because they model symbols defined in C code but not in linker scripts. 
Tools like SeaHorn [32] and KLEE [12] do support linking of the intermediate 
representation (IR) compiled from each of the source files with an IR linker. By 
using build wrappers like wllvm [46], they can even invoke the native system 
linker, which itself runs the linker script on the machine code sections of the 
object files. The actions of the native linker, however, are not propagated back to 
the IR linker, so the linked IR used for static analysis contains only information 
derived from C source, and not from linker scripts. As a result, these analyzers 
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lack the required precision to prove that a safe program is safe: they generate 
false positives because they have no way of knowing (for example) that a memcpy 
is walking over a valid region of memory defined in the linker script. 


Information Required for Precise Modeling. As we noted earlier in this 
section, linker scripts provide definitions to variables that may only be declared 
in C code, and whose addresses may be used in the program. In addition, linker 
scripts define the layout of code sections; the C program may copy data to and 
from these sections using variables defined in the linker script to demarcate valid 
regions inside the sections. Our aim is to allow the static analyzer to decide the 
memory safety of operations that use linker script definitions (if indeed they 
are safe, i.e., don’t access memory regions outside those defined in the linker 
script). To do this, the analyzer must know (referencing our example in Fig. 1 
but without loss of generality): 


1. that we are copying &text_size bytes starting from &text_start; 

2. that there exists a code section (i.e., a valid region of memory) whose starting 
address equals &text_start and whose size equals &text_size; 

3. the concrete values of that code section’s size and starting address. 


Fact 1 is derived from the source code; Fact 2—from parsing the linker script; 
and Fact 3—from disassembling the fully-linked executable, which will have had 
the sections and symbols laid out at their final addresses by the linker. 


Extending CBMC. CBMC compiles source files with a front-end that emu- 
lates the native compiler (gcc), but which adds an additional section to the end 
of the output binary [41]; this section contains the program encoded in CBMC’s 
analysis-friendly intermediate representation (IR). In particular, CBMC’s front- 
end takes the linker script as a command-line argument, just like gcc, and del- 
egates the final link to the system’s native linker. CBMC thus has access to the 
linker script and the final binary, which contains both native executable code 
and CBMC IR. We send linker script information to CBMC as follows: 


1. use CBMC’s front end to compile the code, producing a fully-linked binary, 
2. parse the linker script and disassemble the binary to get the required data, 
3. augment the IR with the definitions from the linker script and binary, and 

4. analyze the augmented intermediate representation. 


Our extensions are Steps 2 and 3, which we describe in more detail below. They 
are applicable to tools (like SeaHorn and KLEE) that use an IR linker (like 
llvm-link) before analyzing the IR. 


Extracting Linker Script Symbols. Our extension to CBMC reads a linker 
script and extracts the information that we need. For each code section, it 
extracts the symbols whose addresses mark the start and end of the section, 
if any; and the symbol whose address indicates the section size, if any. The 
sections key of Fig. 2 shows the information extracted from the linker script in 
Fig. 1. 
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Extracting Linker Script Symbol Addresses. To remain architecture inde- 
pendent, our extension uses the objdump program (part of the GNU Binutils [31]) 
to extract the addresses of all symbols in an object file (shown in the addresses 
key of Fig. 2). In this way, it obtains the concrete addresses of symbols defined 
in the linker script. 


"sections" : { } 

"text": { ps 
"start": "“text_start", "addresses" : { 
"Size": "text_size" "text_start": "0Ox0200", 

3, "text_size": "0x0600", 

"scratch" : { "scratch start": “0x1000"., 
"start": "scratch start”, "scratch_end": "0x2000", 
"end": "scratch_end" } 


Fig. 2. Output from our linker script parser when run on the linker script in Fig. 1, on 
a binary with a 1 KiB .text section and 4 KiB .scratch section. 


Augmenting the Intermediate Representation. CBMC maintains a sym- 
bol table of all the variables used in the program. Variables that are declared 
extern in C code and never defined have no initial value in the symbol table. 
CBMC can still analyze code that contains undefined symbols, but as noted ear- 
lier in this section, this can lead to incorrect verification results. Our extension 
to CBMC extracts information described in the previous section and integrates 
it into the target program’s IR. For example, given the source code in Fig. 1, 
CBMC will replace it with the code given in Fig. 3. 
In more detail, CBMC 


1. converts the types of linker symbols in the IR and symbol table to char *, 

2. updates all expressions involving linker script symbols to be consistent with 
this type change, 

3. creates the IR representation of C-language definitions of the linker script 
symbols, initializing them before the entry point of main(), and 

4. uses the __CPROVER_allocated_memory API described in Sect. 4.1 to mark code 
sections demarcated by linker script symbols as allocated. 


The first two steps are necessary because C will not let us set the address of 
a variable, but will let us store the address in a variable. CBMC thus changes 
the IR type of text_start to char *; sets the value of text_start to the address 
of text_start in the binary; and rewrites all occurrences of “&text_start” to 
“text_start”. This preserves the original semantics while allowing CBMC to 
model the program. The semantics of Step 4 is impossible to express in C, 
justifying the use of CBMC rather than a simple source-to-source transformation. 
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#include <string.h> 


extern char text_start; 
extern char text_size; 
extern char scratch_start; 


int main() { 


memcpy (&text_start, 
&scratch_start, 
(size_t)ktext_size); 


#include <string.h> 


char *text_start = 0x0200; 
char *text_size = 0x0600; 
char *scratch_start = 0xi000; 


int main() { 
__CPROVER_allocated_memory ( 
0x0200, 0x0600); 
__CPROVER_allocated_memory ( 
0x1000, 0x1000); 
memcpy (text_start, 
scratch_start, 
(size_t)text_size); 


Fig. 3. Transformation performed by CBMC for linker-script-defined symbols. 


5 Industrial Boot Code Verification 


In this section, we describe our experience proving memory safety of boot code 
running in an AWS data center. We give an exact statement of what we proved, 
we point out examples of the verification challenges mentioned in Sect. 4 and our 
solutions, and we go over the test harness and the results of running CBMC. 


Boot configuration 


Boot sources 


l Straps | OTP 


Any binary 


Any boot configuration 


UART 


No memory 
safety errors 


Any source 


Any device configuration 


| Device configuration | 


Fig. 4. Boot code is free of memory safety errors. 


We use CBMC to prove that 783 lines of AWS boot code are memory safe. 
Soundness of this proof by bounded model checking is achieved by having CBMC 
check its loop unwinding assertions (that loops have been sufficiently unwound). 
This boot code proceeds in two stages, as illustrated in Fig. 4. The first stage 
prepares the machine, loads the second stage from a boot source, and launches 


Model Checking Boot Code from AWS Data Centers 479 


the second stage. The behavior of the first stage is controlled by configuration 
information in hardware straps and one-time-programmable memory (OTP), 
and by device configuration. We show that no configuration will induce a memory 
safety error in the stage 1 boot code. 

More precisely, we prove: 


Assuming 
— a buffer for stage 2 code and a temporary buffer are both 1024 bytes, 
— the cryptographic, CRC computation, and printf methods have no side 
effects and can return unconstrained values, 
— the CBMC model of memcpy and memset, and 
— ignoring a loop that flashes the console lights when boot fails; 
then 
— for every boot configuration, 
— for every device configuration, 
— for each of the three boot sources, and 
— for every stage 2 binary, 
the stage 1 boot code will not exhibit any memory safety errors. 


Due to the second and third assumptions, we may be missing memory safety 
errors in these simple procedures. Memory safety of these procedures can be 
established in isolation. We find all memory safety errors in the remainder of the 
code, however, because making buffers smaller increases the chances they will 
overflow, and allowing methods to return unconstrained values increases the set 
of program behaviors considered. 

The code we present in this section is representative of the code we ana- 
lyzed, but the actual code is proprietary and not public. The open-source project 
rBoot [11] is 700 lines of boot code available to the public that exhibits most of 
the challenges we now discuss. 


5.1 Memory-Mapped I/O 


MMIO regions are not explicitly allocated in the code, but the addresses of 
these regions appear in the header files. For example, an MMIO region for the 
hardware straps is given with 

#define REG_BASE (0x1000) 


#define REG_BOOT_STRAP (REG_BASE + 0x110) 
#define REG_BOOT_CONF (REG_BASE + 0x124) 


Each of the last two macros denotes the start of a different MMIO region, leav- 
ing 0x14 bytes for the region named REG_BOOT_STRAP. Using the builtin function 
added to CBMC (Sect. 4.1), we declare this region in the test harness with 


__CPROVER_allocated_memory (REG_BOOT_STRAP, 0x14); 
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5.2 Device Behavior 


All of the devices accessed by the boot code are accessed via an API. For example, 
the API for the UART is given by 
int UartInit(UART_PORT port, unsigned int baudRate) ; 


void UartWriteByte(UART_PORT port, uint8_t byte); 
uint8_t UartReadByte(UART_PORT port); 


In this work, we havoc all of the devices to make our result as strong as 
possible. In other words, our device model allows a device read to return any 
value of the appropriate type, and still we can prove that (even in the context 
of a misbehaving device) the boot code does not exhibit a memory safety error. 
Because all devices have an API, we can havoc the devices using the command 
line option added to CBMC (Sect. 4.2), and invoke CBMC with 


--remove-function-body UartInit 
--remove-function-body UartReadByte 
--remove-function-body UartWriteByte 


5.3 Byte-Level Memory Access 


All devices are accessed at the byte level by computing an integer-valued address 
and coercing it to a pointer. For example, the following code snippets from 
BootOptionsParse show how reading the hardware straps from the MMIO region 
discussed above translates into a byte-level memory access. 


#define REG_READ (addr) (*(volatile uint32_t*) (addr) ) 


regVal = REG_READ(REG_BOOT_STRAP) ; 


In CBMC, this translates into an access into an array modeling memory at loca- 
tion 0x1000 + 0x110. Our optimized encoding of the theory of arrays (Sect. 4.3) 
enables CBMC to reason more efficiently about this kind of construct. 


5.4 Memory Copying 


The memset and memcpy procedures are heavily used in boot code. For example, 
the function used to copy the stage 2 boot code from flash memory amounts to 
a single, large memcpy. 
int SNOR_Read (unsigned int address, 
uint8_t* buff, 
unsigned int numBytes) { 


memcpy (buff, 
(void*) (address + REG_SNOR_BASE_ADDRESS), 
numBytes) ; 


} 


CBMC reasons more efficiently about this kind of code due to our loop-free 
model of memset and memcpy procedures as array operations (Sect. 4.4). 
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5.5 Linker Scripts 


Linker scripts allocate regions of memory and pass the addresses of these regions 
and other constants to the code through the symbol table. For example, the linker 
script defines a region to hold the stage 2 binary and passes the address and size 
of the region as the addresses of the symbols stage2_start and stage2_size. 


.stage2 (NOLOAD) : { 
stage2_start = .; 
= . + STAGE2_SIZE; 
stage2_end = .; 
} > RAM2 
stage2_size = SIZEOF(.stage2); 


The code declares the symbols as externally defined, and uses a pair of macros 
to convert the addresses of the symbols to an address and a constant before use. 


extern char stage2_start[]; 
extern char stage2_size[]; 


#define STAGE2_ADDRESS ((Cuint8_t*) (&stage2_start)) 
#define STAGE2_SIZE (Cunsigned) (&stage2_size)) 


CBMC’s new approach to handling linker scripts modifies the CBMC interme- 
diate representation of this code as described in Sect. 4.5. 


5.6 Test Harness 


The main procedure for the boot code begins by clearing the BSS section, copying 
a small amount of data from a ROM, printing some debugging information, and 
invoking three functions 


SecuritySettingsOtp(); 
BootOptionsParse(); 
Stage2LoadAndExecute () ; 


that read security settings from some one-time programmable memory, read the 
boot options from some hardware straps, and load and launch the stage 2 code. 
The test harness for the boot code is 76 lines of code that looks similar to 


void environment_model() f{ 
__CPROVER_allocated_memory (REG_BOOT_STRAP, 0x14); 
__CPROVER_allocated_memory (REG_UART_UART_BASE, 

UART_REG_OFFSET_LSR + 

sizeof (uint32_t)); 
__CPROVER_allocated_memory (REG_NAND_CONFIG_REG, 

sizeof (uint32_t)); 

} 


void harness() { 
environment_model(); 
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SecuritySettingsOtp(); 

BootOptionsParse(); 

Stage2LoadAndExecute () ; 
} 


The environment_model procedure defines the environment of the software under 
test not declared in the boot code itself. This environment includes more than 
30 MMIO regions for hardware like some hardware straps, a UART, and some 
NAND memory. The fragment of the environment model reproduced above uses 
the __CPROVER_allocated_memory built-in function added to CBMC for this work 
to declare these MMIO regions and assign them unconstrained values (model- 
ing unconstrained configuration information). The harness procedure is the test 
harness itself. It builds the environment model and calls the three procedures 
invoked by the boot code. 


5.7 Running CBMC 


Building the boot code and test harness for CBMC takes 8.2s compared to 
building the boot code with gcc in 2.2s. 

Running CBMC on the test harness above as a job under AWS Batch, it 
finished successfully in 10:02 min. It ran on a 16-core server with 122 GiB of 
memory running Ubuntu 14.04, and consumed one core at 100% using 5 GiB of 
memory. The new encoding of arrays improved this time by 45s. 

The boot code consists of 783 lines of statically reachable code, meaning the 
number of lines of code in the functions that are reachable from the test harness 
in the function call graph. CBMC achieves complete code coverage, in the sense 
that every line of code CBMC fails to exercise is dead code. An example of dead 
code found in the boot code is the default case of a switch statement whose cases 
enumerate all possible values of an expression. 

The boot code consists of 98 loops that fall into two classes. First are for- 
loops with constant-valued expressions for the upper and lower bounds. Second 
are loops of the form while (num) {...; num--} and code inspection yields a 
bound on num. Thus, it is possible to choose loop bounds that cause all loop- 
unwinding assertions to hold, making CBMC’s results sound for boot code. 


6 Conclusion 


This paper describes industrial experience with model checking production code. 
We extended CBMC to address issues that arise in boot code, and we proved that 
initial boot code running in data centers at Amazon Web Services is memory safe, 
a significant application of model checking in the industry. Our most significant 
extension to CBMC was parsing linker scripts to extract the memory layout 
described there for use in model checking, making CBMC the first static analysis 
tool to do so. With this and our other extensions to CBMC supporting devices 
and byte-level access, CBMC can now be used in a continuous validation flow to 
check for memory safety during code development. All of these extensions are in 
the public domain and freely available for immediate use. 
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Abstract. In this paper, we propose Android Stack Machine (ASM), a 
formal model to capture key mechanisms of Android multi-tasking such 
as activities, back stacks, launch modes, as well as task affinities. The 
model is based on pushdown systems with multiple stacks, and focuses 
on the evolution of the back stack of the Android system when interact- 
ing with activities carrying specific launch modes and task affinities. For 
formal analysis, we study the reachability problem of ASM. While the 
general problem is shown to be undecidable, we identify expressive frag- 
ments for which various verification techniques for pushdown systems or 
their extensions are harnessed to show decidability of the problem. 


1 Introduction 


Multi-tasking plays a central role in the Android platform. Its unique design, via 
activities and back stacks, greatly facilitates organizing user sessions through 
tasks, and provides rich features such as handy application switching, back- 
ground app state maintenance, smooth task history navigation (using the “back” 
button), etc [16]. We refer the readers to Sect. 2 for an overview. 

Android task management mechanism has substantially enhanced user expe- 
riences of the Android system and promoted personalized features in app design. 
However, the mechanism is also notoriously difficult to understand. As a witness, 
it constantly baffles app developers and has become a common topic of question- 
and-answer websites (for instance, [2]). Surprisingly, the Android multi-tasking 
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mechanism, despite its importance, has not been thoroughly studied before, let 
along a formal treatment. This has impeded further developments of computer- 
aided (static) analysis and verification for Android apps, which are indispensable 
for vulnerability analysis (for example, detection of task hijacking [16]) and app 
performance enhancement (for example, estimation of energy consumption [8]). 

This paper provides a formal model, i.e., Android Stack Machine (ASM), 
aiming to capture the key features of Android multi-tasking. ASM addresses the 
behavior of Android back stacks, a key component of the multi-tasking machin- 
ery, and their interplay with attributes of the activity. In this paper, for these 
attributes we consider four basic launch modes, i.e., standard ( STD), singleTop 
(STP), singleTask (STK), singleInstance (SIT), and task affinities. (For simplic- 
ity more complicated activity attributes such as allowTaskReparenting will not 
be addressed in the present paper.) We believe that the semantics of ASM, spec- 
ified as a transition system, captures faithfully the actual mechanism of Android 
systems. For each case of the semantics, we have created “diagnosis” apps with 
corresponding launch modes and task affinities, and carried out extensive exper- 
iments using these apps, ascertaining its conformance to the Android platform. 
(Details will be provided in Sect. 3.) 

For Android, technically ASM can be viewed as the counterpart of push- 
down systems with multiple stacks, which are the de facto model for (multi- 
threaded) concurrent programs. Being rigours, this model opens a door towards 
a formal account of Android’s multi-tasking mechanism, which would greatly 
facilitate developers’ understanding, freeing them from lengthy, ambiguous, elu- 
sive Android documentations. We remark that it is known that the evolution 
of Android back stacks could also be affected by the intent flags of the activi- 
ties. ASM does not address intent flags explicitly. However, the effects of most 
intent flags (e.g., FLAG_ ACTIVITY _NEW_TASK, FLAG_ACTIVITY_CLEAR_TOP) can 
be simulated by launch modes, so this is not a real limitation of ASM. 

Based on ASM, we also make the first step towards a formal analysis of 
Android multi-tasking apps by investigating the reachability problem which is 
fundamental to all such analysis. ASM is akin to pushdown systems with multiple 
stacks, so it is perhaps not surprising that the problem is undecidable in general; 
in fact, we show undecidability for most interesting fragments even with just two 
launch modes. In the interest of seeking more expressive, practice-relevant decid- 
able fragments, we identify a fragment STK-dominating ASM which assumes 
STK activities have different task affinities and which further restricts the use 
of SIT activities. This fragment covers a majority of open-source Android apps 
(e.g., from Github) we have found so far. One of our technical contributions 
is to give a decision procedure for the reachability problem of STK-dominating 
ASM, which combines a range of techniques from simulations by pushdown sys- 
tems with transductions [19] to abstraction methods for multi-stacks. The work, 
apart from independent interests in the study of multi-stack pushdown systems, 
lays a solid foundation for further (static) analysis and verification of Android 
apps related to multi-tasking, enabling model checking of Android apps, secu- 
rity analysis (such as discovering task hijacking), or typical tasks in software 
engineering such as automatic debugging, model-based testing, etc. 


Android Stack Machine 489 


We summarize the main contributions as follows: (1) We propose—to the best 
of our knowledge—the first comprehensive formal model, Android stack machine, 
for Android back stacks, which is also validated by extensive experiments. (2) We 
study the reachability problem for Android stack machine. Apart from strongest 
possible undecidablity results in the general case, we provide a decision procedure 
for a practically relevant fragment. 


2 Android Stack Machine: An Informal Overview 


In Android, an application, usually referred to as an app, is regarded as a collec- 
tion of activities. An activity is a type of app components, an instance of which 
provides a graphical user interface on screen and serves the entry point for inter- 
acting with the user [1]. An app typically has many activities for different user 
interactions (e.g., dialling phone numbers, reading contact lists, etc). A distin- 
guished activity is the main activity, which is started when the app is launched. 
A task is a collection of activities that users interact with when performing a 
certain job. The activities in a task are arranged in a stack in the order in which 
each activity is opened. For example, an email app might have one activity to 
show a list of latest messages. When the user selects a message, a new activity 
opens to view that message. This new activity is pushed to the stack. If the user 
presses the “Back” button, an activity is finished and is popped off the stack. 
[In practice, the onBackPressed() method can be overloaded and triggered when 
the “Back” button is clicked. Here we assume—as a model abstraction—that 
the onBackPressed() method is not overloaded.] Furthermore, multiple tasks 
may run concurrently in the Android platform and the back stack stores all the 
tasks as a stack as well. In other words, it has a nested structure being a stack 
of stacks (tasks). We remark that in android, activities from different apps can 
stay in the same task, and activities from the same app can enter different tasks. 

Typically, the evolution of the back stack is dependent mainly on two 
attributes of activities: launch modes and task affinities. All the activities of an 
app, as well as their attributes, including the launch modes and task affinities, 
are defined in the manifest file of the app. The launch mode of an activity decides 
the corresponding operation of the back stack when the activity is launched. As 
mentioned in Sect. 1, there are four basic launch modes in Android: “standard”, 
“singleTop”, “singleTask” and “singleInstance”. The task affinity of an activity 
indicates to which task the activity prefers to belong. By default, all the activ- 
ities from the same app have the same affinity (i.e., all activities in the same 
app prefer to be in the same task). However, one can modify the default affinity 
of the activity. Activities defined in different apps can share a task affinity, or 
activities defined in the same app can be assigned with different task affinities. 
Below we will use a simple app to demonstrate the evolution of the back stack. 


Example 1. In Fig.1, an app ActivitiesLaunchDemo! is illustrated. The app 


contains four activities of the launch modes STD, STP, STK and SIT, depicted 


1 Adapted from an open-source app https://github.com/wauoen/LaunchModeDemo. 
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by green, blue, yellow and red, respectively. We will use the colours to name the 
activities. The green, blue and red activities have the same task affinity, while 
the yellow activity has a distinct one. The main activity of the app is the green 
activity. Each activity contains four buttons, i.e., the green, blue, yellow and red 
button. When a button is clicked, an instance of the activity with the colour 
starts. Moreover, the identifiers of all the tasks of the back stack, as well as 
their contents, are shown in the white zones of the window. We use the following 
execution trace to demonstrate how the back stack evolves according to the 
launch modes and the task affinities of the activities: The user clicks the buttons 
in the order of green, blue, blue, yellow, red, and green. 


1. [Launch the app| When the app is launched, an instance of the main activity 
starts, and the back stack contains exactly one task, which contains exactly 
one green activity (see Fig. 1(a)). For convenience, this task is called the green 
task (with id: 23963). 

2. [Start an STD activity] When the green button is clicked, since the launch 
mode of the green activity is STD, a new instance of the green activity starts 
and is pushed into the green task (see Fig. 1(b)). 

3. [Start an STP activity] When the blue button is clicked, since the top activity 
of the green task is not the blue activity, a new instance of the blue activity 
is pushed into the green task (see Fig. 1(c)). On the other hand, if the blue 
button is clicked again, because the launch mode of the blue activity is STP 
and the top activity of the green task is already the blue one, a new instance 
of the blue activity will not be pushed into the green task and its content is 
kept unchanged. 

4. [Start an STK activity] Suppose now that the yellow button is clicked, since 
the launch mode of the yellow activity is STK, and the task affinity of the 
yellow activity is different from that of the bottom activity of the green task, 
a new task is created and an instance of the yellow activity is pushed into 
the new task (called the yellow task, with id: 23964, see Fig. 1(d), where the 
leftmost task is the top task of the back stack). 

5. [Start an SIT activity] Next, suppose that the red button is clicked, because 
the launch mode of the red activity is SIT, a new task is created and an 
instance of the red activity is pushed into the new task (called the red task, 
with id: 23965, see Fig. 1(e)). Moreover, at any future moment, the red activity 
is the only activity of the red task. Note that here a new task is created in 
spite of the affinity of the red activity. 

6. [Start an STD activity from an SIT activity] Finally, suppose the green button 
is clicked again. Since the top task is the red task, which is supposed to contain 
only one activity (i.e., the red activity), the green task is then moved to the 
top of the back stack and a new instance of the green activity is pushed into 
the green task (see Fig. 1(f)). 


3 Android Stack Machine 


For k € N, let [k] = {1,--- ,&}. For a function f : X — Y, let dom(/) and rng(f) 
denote the domain (X) and range (Y) of f respectively. 


Android Stack Machine 491 


Fig. 1. ActivitiesLaunchDemo: the running example (Color figure online) 


Definition 1 (Android stack machine). An Android stack machine (ASM) 
is a tuple A = (Q, Sig, go, A), where 


- Q is a finite set of control states, and qo E€ Q is the initial state, 
- Sig = (Act, Lmd, Aft, Ap) is the activity signature, where 

e Act is a finite set of activities, 

e Lmd : Act > {STD,STP,STK, SIT} is the launch-mode function, 

e Aft: Act — [m] is the task-affinity function, where m = |Act|, 

e Ao € Act is the main activity, 
- A C Qx (Act U {P}) x Inst x Q is the transition relation, where Inst = 
{O, back} U {start(A) | A € Act}, such that (1) for each transition 
(q,A,a,q') € A, it holds that q' 4 qo, and (2) for each transition (q,>,a,q') € 
A, it holds that q = qo, a = start( Ao), and q' £ qo. 


: : os A, 
For convenience, we usually write a transition (q, A,a,q’) E€ A as q Seg, 


and (q,>,a,q') € Aas q =", q'. Intuitively, > denotes an empty back stack, 
denotes there is no change over the back stack, back denotes the pop action, and 
start(A) denotes the activity A being started. We assume that, if the back stack 
is empty, the Android stack system terminates (i.e., no further continuation is 
possible) unless it is in the initial state go, We use Act, to denote {B € Act | 
Lmd(B) = x} for x € {STD, STP, STK, SIT}. 


Semantics. Let A = (Q, Sig, qo, A) be an ASM with Sig = (Act, Lmd, Aft, Ao). 
A task of A is encoded as a word S = [Aj,--: , An] E Act? which denotes 
the content of the stack, with A; (resp. An) as the top (resp. bottom) symbol, 
denoted by top(S) (resp. btm(.S)). We also call the bottom activity of a 
non-empty task S as the root activity of the task. (Intuitively, this is the 
first activity of the task.) For x € {STD,STP,STK, SIT}, a task S$ is called a 
x-task if Lmd(btm(S')) = x. We define the affinity of a task S, denoted by Aft( S), 
to be Aft(btm(.S)). For Sı € Act* and S € Act*, we use Sı - S2 to denote the 
concatenation of Sı and S2, and € is used to denote the empty word in Act*. 
As mentioned in Sect.2, the (running) tasks on Android are organized as 
the back stack, which is the main modelling object of ASM. Typically we write 
a back stack p as a sequence of non-empty tasks, i.e., p = (S1,--- , Sn), where 
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Sı and Sn are called the top and the bottom task respectively. (Intuitively, S1 
is the currently active task.) € is used to denote the empty back stack. For a 
non-empty back stack p = (S1,--- , Sn), we overload top by using top(p) to refer 
to the task $1, and thus top?(p) the top activity of S1. 


Definition 2 (Configurations). A configuration of A is a pair (q, p) where q € 
Q and p is a back stack. Assume that p = (S1,--+ , Sn) with Si = [Aia,--> p, Aims] 
for each i € |n]. We require p to satisfy the following constraints: 


1. For each A € Actstx or A € Actsit, A occurs in at most one task. Moreover, 
if A occurs in a task, then A occurs at most once in that task. [At most one 
instance for each STK/SIT-activity] 

2. For each i € [n] and j € [m; — 1] such that Aj; € Actstp, Aij A Ai j+1- 
[Non-stuttering forSTP- activities] 

3. For each i € |n] and j € [mi] such that Aij E€ Actstk, Aft(A; j) = Aft(S;). 
[Affinities of STK- activities agree to the host task] 

4. For eachi € |n] and j € [mj] such that Ai į E€ Actsit, m; = 1. [SIT-activities 
monopolize a task] 

5. Fori + j € |n] such that btm(S;) ¢ Actsir and btm(S;) ¢ Actsir, Aft(S;) A 
Aft(S;). [Affinities of tasks are mutually distinct, except for those 
rooted at SIT- activities] 


By Definition 2(5), each back stack p contains at most |Actsit| + |rng(Aft)| 
(more precisely, |Actsit| + |{Aft(A) | A € Act \ Actsit}|) tasks. Moreover, by 
Definition 2(1-5), all the root activities in a configuration are pairwise distinct, 
which allows to refer to a task whose root activity is A as the A-task. 

Let Conf 4 denote the set of configurations of A. The initial configuration of 
A is (qo, €). To formalize the semantics of A concisely, we introduce the following 
shorthand stack operations and one auxiliary function. Here p = (S1,--- , Sn) is 
a non-empty back stack. 


Noaction(p) = p Push(p, B) = (([B]- S1), S2,- , Sn) 
NewTask(B) = ([B]) NewTask(p, B) = ([B], .S1,--- , Sn) 
E, if n = 1 and Sı = [A]; 
Pop(p) = 4 (S2, , Sn), if n > 1 and Sı = [A]; 
(Si, S2, Sn), if Sı = [A]- S with S! € Actt; 
è 


PopUntil(p, B) = (S7, .S2,--- , Sn), where 

Sı = S1 - SY with S1 € (Act \ {B})* and top( SY) = B; 
Move2Top(p, i) = (Si, Si, Eag Si—1, Si+1, vie Sn) 
Si, if Aft(S;) = k and Lmd(btm(S;)) # SIT; 


GetNonSIT TaskByAft(p, k) = { Under otherwise 


Intuitively, GetNonSIT TaskByAft(p, k) returns a non-SIT task whose affinity 
is k if it exists, otherwise returns Undef. 

In the sequel, we define the transition relation (q, p) A (q',p') on Conf 4 to 
formalize the semantics of A. We start with the transitions out of the initial 
state qo and those with U or back action. 


Android Stack Machine 493 


— For each transition qo basa q, (qo, €) A (q,NewTask(Ao)). 


— For each transition q a q' and (q,p) € Conf, such that top?(p) = A, 


(a, p) Æ (q', Noaction(p)). 
— For each transition q A, qd’ and (q, p) € Conf, such that top?(p) = A, 


A 
(a, p) — (q’, Pop(p)). 
sstart(B) 
— 


: A : zn A 
The most interesting case is, however, the transitions of the form q 


q’. We shall make case distinctions based on the launch mode of B. For each 
transition q aa) qd and (q,p) € Conf, such that top?(p) = A, (q, p) A 


(q', p’) if one of the following cases holds. Assume p = (S1, , Sn). 
CAsE Lmd(B) = STD 


— Lmd(A) # SIT, then p’ = Push(p, B); 
— Lmd(A) = SIT?, then 


e if GetNonSIT TaskByAft(p, Aft(B)) = S;?, then p’ =Push(Move2Top(p, i), B), 
e if GetNonSIT TaskByAft(p, Aft(B)) =Undef, then p’ = NewTask(p, B); 


CASE Lmd(B) = STP 


— Lmd(A) # SIT and A ¥ B, then p’ = Push(p, B); 
— Lmd(A) # SIT and A = B, then p’ = Noaction(p); 
— Lmd(A) = SIT (see footnote 2), 


e if GetNonSIT TaskByAft(p, Aft(B)) = S; (see footnote 3), then 


* if top(S;) A B, p' = Push(Move2Top(p, i), B), 
* if top(.5;) = B, p' = Move2Top(p, i); 


e if GetNonSIT TaskByAft(p, Aft(B)) = Undef, then p’ = NewTask(p, B); 


CASE Lmd(B) = SIT 


— A= B (see footnote 2), then p’ = Noaction(p); 
~ A#B and S; = |B] for some i € [n]*, then p’ = Move2Top(p, i); 
- A# B and S; # |B] for each i € [n], then p’ = NewTask(p, B); 


CaAsE Lmd(B) = STK 
— Lmd(A) # SIT and Aft(B) = Aft( S1), then 


e if B does not occur in S15, then p! = Push(p, B); 
e if B occurs in $1°, then p' = PopUntil(p, B); 


- Lmd(A) 4 SIT => Aft(B) Æ Aft(51), then 


? By Definition 2(4), Sı = [A]. 

3 If i exists, it must be unique by Definition 2(5). Moreover, i > 1, as Lmd(A) = SIT. 
4 If i exists, it must be unique by Definition 2(1). Moreover, i > 1, as A £ B. 

5 B does not occur in p at all by Definition 2(3-5). 

ê Note that B occurs at most once in Sı by Definition 2(1). 
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e if GetNonSIT TaskByAft(p, Aft(B)) = $;", 


* if B does not occur in S$; (see footnote 5), then p’ = 
Push(Move2Top(p, i), B); 
* if B occurs in §;°, then p' = PopUntil(Move2Top(p, i), B), 


e if GetNonSITTaskByAft(p, Aft(B)) = Undef, then p' = NewTask(p, B); 


This concludes the definition of the transition definition of a As usual, we 
use A to denote the reflexive and transitive closure of Be 


Example 2. The ASM for the ActivitiesLaunchDemo app in Example1 is A = 
(Q, Sig, go, A), where Q = {qo, q1}, Sig = (Act, Lmd, Aft, Ag) with 


— Act = {A,,Ap, Ay, Ar}, corresponding to the green, blue, yellow and red 
activity respectively in the ActivitiesLaunchDemo app, 

— Lmd(A,) = STD, Lmd(A,) = STP, Lmd(A,) = STK, Lmd(A,) = SIT, 

— Aft(A, j= Aft(Ap) = Aft(A,) = 1, Aft(A,) = 2, 

and A comprises the transitions illustrated in Fig. 2. Below is a path in the graph 

A corresponding to the sequence of user actions clicking the green, blue, blue, 


yellow, red, blue button (cf. Example 1), 


>,start(A,) Ag,start(Ay) Ap,start(A;,) 
os 1 m 


(qo, E) 
(qı; ([Av, Ag])) 
(a1; ([Ar], [Ay] [Ao; Ag])) 


(a1, ([Ael)) 


Ap,start(A,) 
———> (a, (lAl, [A Ag])) 


LETEA, (qn, ([Ag Ad, Aol, [Ar], [Ay)))- 


(q, ({Ap, Agl)) 
Ay,start( Ar) 
c 


Proposition 1 reassures that A, is indeed a Ac, start(Ay) : 
relation on Conf 4 as per Definition 2. cc E {g,b,y,7} 


Proposition 1. Let A be an ASM. For each 


P 
>, start( Ag) 
A 
(a, p) € Conf, and (q,p) & (d', p"), (d',o) € @) 


Conf, namely, (q',p') satisfies the five con- 
straints in Definition 2. Fig. 2. ASM corresponding to 
the ActivitiesLaunchDemo app 


Remark 1. A single app can clearly be modeled 

by an ASM. However, ASM can also be used 

to model multiple apps which may share tasks/activities. (In this case, these 
multiple apps can be composed into a single app, where a new main activity 
is added.) This is especially useful when analysing, for instance, task hijacking 
[16]. We sometimes do not specify the main activity explicit for convenience. 
The translation from app source code to ASM is not trivial, but follows standard 
routines. In particular, in ASM, the symbols stored into the back stack are just 


T If i exists, it must be unique by Definition 2(5). Moreover, i > 1, as Lmd(A) 4 
SIT => Aft(B) # Aft(S1). 
8 Note that B occurs at most once in S; by Definition 2(1). 
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names of activities. Android apps typically need to, similar to function calls 
of programs, store additional local state information. This can be dealt with 
by introducing an extend activity alphabet such that each symbol is of the 
form A(b), where A € Act and b represents local information. When we present 
examples, we also adopt this general syntax. 


Model validation. We validate the ASM model by designing “diagnosis” Android 
apps with extensive experiments. For each case in the semantics of ASM, we 
design an app which contains activities with the corresponding launch modes and 
task affinities. To simulate the transition rules of the ASM, each activity contains 
some buttons, which, when clicked, will launch other activities. For instance, in 
the case of Lmd(B) = STD, Lmd(A) = SIT, GetNonSITTaskByAft(p, Aft(B)) = 
Undef, the app contains two activities A and B of launch modes SIT and STD 
respectively, where A is the main activity. When the app is launched, an instance 
of A is started. A contains a button, which, when clicked, starts an instance of 
B. We carry out the experiment by clicking the button, monitoring the content 
of the back stack, and checking whether the content of the back stack conforms 
to the definition of the semantics. Specifically, we check that there are exactly 
two tasks in the back stack, one task comprising a single instance of A and 
another task comprising a single instance of B, with the latter task on the top. 
Our experiments are done in a Redmi-4A mobile phone with Android version 
6.0.1. The details of the experiments can be found at https://sites.google.com/ 
site/assconformancetesting/. 


4 Reachability of ASM 


Towards formal (static) analysis and verification of Android apps, we study 
the fundamental reachability problem of ASM. Fix an ASM A = (Q, Sig, qo, A) 
with Sig = (Act, Lmd, Aft, Ag) and a target state q E€ Q. There are usually two 


variants: the state reachability problem asks whether (qo, €) i (q, p) for some 


back stack p, and the configuration reachability problem asks whether (qo, €) A 
(q, p) when pis also given. We show they are interchangeable as far as decidability 
is concerned. 


Proposition 2. The configuration reachability problem and the state reachabil- 
ity problem of ASM are interreducible in exponential time. 


Proposition 2 allows to focus on the state reachability problem in the rest of 
this paper. Observe that, when the activities in an ASM are of the same launch 
mode, the problem degenerates to that of standard pushdown systems or even 
finite-state systems. These systems are well-understood, and we refer to [6] for 
explanations. To proceed, we deal with the cases where there are exactly two 
launch modes, for which we have A = 6 possibilities. The classification is given 
in Theorems 1 and 2. Clearly, they entail that the reachability for general ASM 
(with at least two launch modes) is undecidable. To show the undecidablity, we 
reduce from Minsky’s two-counter machines [14], which, albeit standard, reveals 
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the expressibility of ASM. We remark that the capability of swapping the order 
of two distinct non-SIT-tasks in the back stack—without resetting the content of 
any of them—is the main source of undecidability. 


Theorem 1. The reachability problem of ASM is undecidable, even when the 
ASM contains only (1) STD and STK activities, or (2) STD and SIT activities, 
or (8) STK and STP activities, or (4) SIT and STP activities. 


In contrast, we have some relatively straightforward positive results: 


Theorem 2. The state reachability problem of ASM is decidable in polynomial 
time when the ASM contains STD and STP activities only, and in polynomial 
space when the ASM contains STK and SIT activities only. 


As mentioned in Sect. 1, we aim to identify expressive fragments of ASM with 
decidable reachability problems. To this end, we introduce a fragment called 
STK-dominating ASM, which accommodates all four launch modes. 


Definition 3 (STK-dominating ASM). An ASM is said to be STK- 
dominating if the following two constraints are satisfied: 
(1) the task affinities of the STK activities are mutually distinct, 


(2) for each transition q Zw, qd € A such that A € Acts, it holds that 
either B € Actsır U Actstx, or B € Actstp U Actstp and Aft( B) = Aft( Ao). 


The following result explains the name “STK-dominating”. 


Proposition 3. Let A = (Q,Sig,qo, A) be an STK-dominating ASM with 
Sig = (Act, Lmd, Aft, Ao). Then each configuration (q,p) that is reachable from 
the initial configuration (qo,€) in A satisfies the following constraints: (1) for 
each STK activity A € Act with Aft(A) 4 Aft( Ao), A can only occur at the bot- 
tom of some task in p, (2) p contains at most one STD/STP-task, which, when 
it exists, has the same affinity as Ag. 


It is not difficult to verify that the ASM given in Example 2 is STK-dominating. 
Theorem 3. The state reachability of STK-dominating ASM is in 2-EXPTIME. 


The proof of Theorem 3 is technically the most challenging part of this paper. 
We shall give a sketch in Sect.5 with the full details in [6]. 


5 STK-dominating ASM 


For simplicity, we assume that A contains STD and STK activities only’. To 
tackle the (state) reachability problem for STK-dominating ASM, we consider 
two cases, i.e., Lmd(Ap) = STK and Lmd(Ap) # STK. The former case is simpler 


° The more general case that A also contains STP and SIT activities is slightly more 
involved and requires more space to present, which can be found in [6]. 
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because, by Proposition 3, all tasks will be rooted at STK activities. For the 
latter, more general case, the back stack may contain, apart from several tasks 
rooted at STK activities, one single task rooted at Ap. Sections 5.1 and 5.2 will 
handle these two cases respectively. 

We will, however, first introduce some standard, but necessary, backgrounds 
on pushdown systems. We assume familiarity with standard finite-state automata 
(NFA) and finite-state transducers (FST). We emphasize that, in this paper, 
FST refers to a special class of finite-state transducers, namely, letter-to-letter 
finite-state transducers where the input and output alphabets are the same. 


Preliminaries of Pushdown systems. A pushdown system (PDS) is a tuple P = 
(Q,T, A), where Q is a finite set of control states, I is a finite stack alphabet, 
and AC Q xT xI™ x Q is a finite set of transition rules. The size of P, denoted 
by |P], is defined as |A]. 

Let P = (Q, T, A) be a PDS. A configuration of P is a pair (q,w) E Q x I™, 
where w denotes the content of the stack (with the leftmost symbol being the 
top of the stack). Let Confp denote the set of configurations of P. We define 


a binary relation P, over Confp as follows: (q, w) Z, (q, w) if w = yw and 
there exists w” € IT* such that (q, y, w”,q') € A and w = ww. We use Z to 
denote the reflexive and transitive closure of Pa 

A configuration (q', w’) is reachable from (q, w) if (q, w) Eiq, w’). For C C 
Confp, pre* (C) (resp. post*(C’)) denotes the set of predecessor (resp. successor) 


reachable configurations {(q', w’) | 3(q, w) € C, (q', w") E, w)} (resp. {(q', w’) | 
A(q,w) E€ C, (q,w) S(q',w’)}). For q € Q, we define Cy = {q} x T™ and write 
pre*(q) and post*(q) as shorthand of pre* (C4) and post*(C,) respectively. 

As a standard machinery to solve reachability for PDS, a P-multi-automaton 
(P-MA) is an NFA A = (Q’,7,6,1, F) such that I C Q C Q’ [4]. Evidently, 
multi-automata are a special class of NFA. Let A = (Q’,I,6,I, F) be a P-MA 
and (q,w) E€ Confp, (q, w) is accepted by A if q € I and there is an accepting 
run qoqı `: Gn of A on w with qo = q. Let Conf 4 denote the set of configurations 
accepted by A. Moreover, let L(A) denote the set of words w such that (q, w) € 
Conf 4 for some q € I. For brevity, we usually write MA instead of P-MA when 
P is clear from the context. Moreover, for an MA A = (Q',T,ô, I, F) and q' € Q, 
we use A(q’) to denote the MA obtained from A by replacing I with {q’}. A set 
of configurations C C Confp is regular if there is an MA A such that Conf 4 = C. 


Theorem 4 ((4]). Given a PDS P and a set of configurations accepted by an 
MA A, we can compute, in polynomial time in |P| + |A|, two MAs Apres and 
Apost that recognise pre*(Conf 4) and post* (Conf 4) respectively. 


The connection between ASM and PDS is rather obvious. In a nutshell, 
ASM can be considered as a PDS with multiple stacks, which is well-known to 
be undecidable in general. Our overall strategy to attack the state reachability 
problem for the fragments of ASM is to simulate them (in particular, the multiple 
stacks) via—in some cases, decidable extensions of —PDS. 
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5.1 Case Lmd(Ao) = STK 


Our approach to tackle this case is to simulate A by an extension of PDS, i.e., 
pushdown systems with transductions (TrPDS), proposed in [19]. In TrPDS, each 
transition is associated with an FST defining how the stack content is modified. 
Formally, a TrPDS is a tuple P = (Q,I, 7, A), where Q and TF are precisely 
the same as those of PDS, 7 is a finite set of FSTs over the alphabet I’, and 
ACQxIxI*x J xQ isa finite set of transition rules. Let R( 7) denote the 
set of transductions defined by FSTs from 7 and [R(.7)] denote the closure 
of R(Z) under composition and left-quotient. A TrPDS P is said to be finite if 
[R(7)] is finite. 

The configurations of P are defined similarly as in PDS. We define a binary 
relation 7+ on Confp as follows: (q, w) an (q’,w’) if there are y € I, the 
words w1,u, w2, and T € J such that w = ywı, (q, y,u, T, q) € A, wi z, 


w2, and w = uwy. Let R denote the reflexive and transitive closure of cay 
Similarly to PDS, we can define pre*(-) and post*(-) respectively. Regular sets 
of configurations of TrPDS can be represented by MA, in line with PDS. More 
precisely, given a finite TrPDS P = (Q,I, 7, A) and an MA A for P, one can 
compute, in time polynomial in |P| +|[R(7)]|+|A|, two MAs Apex and Apost« 
that recognize the sets pre* (Conf 4) and post*(Conf4) respectively [17-19]. 

To simulate A via a finite TrPDS P, the back stack p = (S,--- ,S,) of A 
is encoded by a word S1- -- }SnfL (where ț is a delimiter and L is the bottom 
symbol of the stack), which is stored in the stack of P. Recall that, in this 
case, each task S; is rooted at an STK-activity which sits on the bottom of Sj. 
Suppose top(S,) = A. When a transition (q, A, start(B),q’) with B € Actstx is 
fired, according to the semantics of A, the B-task of p, say S;, is switched to 
the top of p and changed into [B] (i-e., all the activities in the B-task, except B 
itself, are popped). To simulate this in P, we replace every stack symbol in the 
place of S; with a dummy symbol f and keep the other symbols unchanged. On 
the other hand, to simulate a back action of A, P continues popping until the 
next non-dummy and non-delimiter symbol is seen. 


Proposition 4. Let A = (Q,Sig,qo, A) be an STK-dominating ASM with 
Sig = (Act, Lmd, Aft, Ao) and Lmd(Ao) = STK. Then a finite TrPDS P = 
(Q’,I,7,A') with Q C Q can be constructed in time polynomial in |A| such 
that, for each q € Q, q is reachable from (qo,€) in A iff q is reachable from 
(qo, L) in P. 


For a state q € Q, pre} (q) can be effectively computed as an MA B,, and 
the reachability of q in A is reduced to checking whether (qo, -L) € Confg,- 


5.2 Case Lmd(Ao) 4 STK 


We then turn to the more general case Lmd(Ag) 4 STK which is significantly 
more involved. For exposition purpose, we consider an ASM A where there are 
exactly two STK activities A,, A2, and the task affinity of Aə is the same as 
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that of the main task Ap (and thus the task affinity of A; is different from that 
of Ao). We also assume that all the activities in A are “standard” except Aj, A2. 
Namely Act = Actstp U{Aj, A2} and Ao € Actstp in particular. Neither of these 
two assumptions is fundamental and their generalization is given in [6]. 

By Proposition 3, there are at most two tasks in the back stack of A. The 
two tasks are either an Ag-task and an A -task, or an Ag-task and an A,-task. 
An Ao9-task can only surface when the original Ag-task is popped empty. If 
this happens, no Ap-task will be recreated again, and thus, according to the 
arguments in Sect. 5.1, we can simulate the ASM by TrPDS directly and we are 
done. The challenging case is that we have both an Ag-task and an Aj-task. 
To solve the state reachability problem, the main technical difficulty is that 
the order of the Ao-task and the A;-task may be switched for arbitrarily many 
times before reaching the target state q. Readers may be wondering why they 
cannot simply simulate two-counter machines. The reason is that the two tasks 
are asymmetric in the sense that, each time when the Aı-task is switched from 
the bottom to the top (by starting the activity A1), the content of the Aj-task is 
reset into [Ai]. But this is not the case for Ao-task: when the Ao-task is switched 
from the bottom to the top (by starting the activity Ag), if it does not contain 
Ag, then A» will be pushed into the Ap-task; otherwise all the activities above 
Az will be popped and A> becomes the top activity of the Ag-task. Our decision 
procedure below utilises the asymmetry of the two tasks. 


Intuition of construction. The crux of reachability analysis is to construct a 
finite abstraction for the A,-task and incorporate it into the control states of 
A, so we can reduce the state reachability of A into that of a pushdown system 
Pa (with a single stack). Observe that a run of A can be seen as a sequence 


of task switching. In particular, an Ag; A1; Ap switching denotes a path in A, 
where the Ao-task is on the top in the first and the last configuration, while the 
Aı-task is on the top in all the intermediate configurations. The main idea of 
the reduction is to simulate the Ag; A1; Ag switching by a “macro”-transition of 
Pa. Note that the Ao-task regains the top task in the last configuration either 
by starting the activity A> or by emptying the A,-task. Suppose that, for an 
Ao; A1; Ao switching, in the first (resp. last) configuration, q (resp. gq’) is the 
control state and a (resp. 8) is the finite abstraction of the A;-task. Then for 
the “macro”-transition of P4, the control state will be updated from (q, a) to 
(q', 6), and the stack content of P4 is updated accordingly: 


— If the Ag-task regains the top task by starting Ag, then the stack content is 
updated as follows: if the stack does not contain Ag, then A» will be pushed 
into the stack; otherwise all the symbols above As will be popped. 

— On the other hand, if the Ap-task regains the top task by emptying the A,- 
task, then the stack content is not changed. 


Roughly speaking, the abstraction of the A,-task must carry the information 
that, when Ag-task and A,-task are the top resp. bottom task of the back stack 
and Ap-task is emptied, whether the target state q can be reached from the 
configuration at that time. As a result, we define the abstraction of the A,-task 
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whose content is encoded by a word w € Act*, denoted by a(w), as the set of all 
states q” € Q such that the target state q can be reached from (q”,(w)) in A. 
[Note that during the process that q is reached from (q”,(w)) in A, the Ap-task 
does not exist anymore, but a (new) Ag-task, may be formed.] Let Absa, = 2°. 

To facilitate the construction of the PDS P4, we also need to record how the 
abstraction “evolves”. For each (q’, A,a) € Q x (Act\{A1}) x Abs4,, we compute 
the set Reach(q’, A, aœ) consisting of pairs (q”, 3) satisfying: there is an Ao; Ai; Ao 
switching such that in the first configuration, A is the top symbol of the Ag-task, 
q' (resp. q”) is the control state of the first (resp. last) configuration, and a (resp. 
6) is the abstraction for the Aj-task in the first (resp. last) configuration. 1° 
Computing Reach(q’, A, a). Let (q', A,a) E€ Q x (Act \ {A1}) x Abs4,. We first 
simulate relevant parts of A as follows: 


— Following Sect.5.1, we construct a TrPDS Pag = (Qa, Ira , Taa , 4m) 

to simulate the A,-task and A2-task of A after the Ag-task is emptied, where 
Quma =Q U Qx Qand Ig =Act U {#,f, L}. Note that Ao may still—as 
a “standard” activity—occur in Pag though the Ap-task disappears. 
In addition, we construct an MA By = (Qq, Tima » ôq, Iq, Fq) to represent 
prep. (q), where I4 C Qaq. Then given a stack content w € ActgqpAi 
of the A;-task, the abstraction a(w) of w, is the set of q” € I4 N Q such that 
(q”, wiL) € Confg,. 

— We construct a PDS Pazza = (Qaza , Taaa > Jaaa Aaa ) to sim- 
ulate the Aı-task of A, where Iazag = (Act \ {42}) U {1}. In addi- 
tion, to compute Reach(q', A,q) later, we construct an MA Mog 4a) = 
(Q 4,0)» [aay Ola, Aa): Ta’, Aa) F(q’,A,a)) to represent 


qı, Ait) | (qd, A, start( A1), q1) € A}). 


postin (E 


Definition 4. Reach(q’, A, a) comprises 


— the pairs (q",8) € Q x Absy, satisfying that (1) (q',A,start(A1),q1) € A, 
P, 
(2) (1,411) == (q2, Bw), (3) (q2, B,start(A2),q") € A, and (4) B 
is the abstraction of Bw, for some B E Act \ {Az}, w E (Act \ {42})* and 
q1; q2 E Q, 
: Parag 
- the pairs (q”, L) such that (q', A,start(A1),q1) € A and (qı, 411) == 
(q”, L) for some qı E Q. 


Importantly, conditions in Definition 4 can be characterized algorithmically. 


Lemma 1. For (q',A,a) € Q x (Act \ {41 }) x Abs4,, Reach(q', A,a) is the 


union of 


— {(q",1) | (@", L) € Conf m, } and 


',A,a) 


10 As we can see later, Reach(q’, A,a:) does not depend on a for the two-task special 
case considered here. We choose to keep a in view of readability. 
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- the set of pairs (q”, 8) € Q x Absa, such that there exist qo E€ Q and B € 
Act \ {Ao} satisfying that (q2, B, start( A2), q”), and 
(B(Act \ {Ao})*#L) N (Actirp Ar L) N (LM cg A,a)(42))(L) “DELO Loa #0, 
where L(Mq,4,0)(q2))(L)~+ is the set of words w such that wL belongs to 


L(Ma,A,a)(Q2)), and Lp = N LB no N LB), with Z 
qre qg” EQ\B 


representing the complement language of L. 


mt 


Construction of Pa. We first construct a PDS Pa, = (Qa,,I4,,Ma,), to 
simulate the Ap-task of A. Here Qa, = (Q x {0,1}) U (Q x {1} x {pop}), 
Ia, = Actstp U {Az, L}, and Ay, comprises the transitions. Here 1 (resp. 0) 
marks that the activity Ag is in the stack (resp. is not in the stack) and the tag 
pop marks that the PDS is in the process of popping until Ag. The construction 
of Pa, is relatively straightforward, the details of which can be found in [6]. 

We then define the PDS P4 = (Qa, Tao, Aa), where Qa = (Absa, X Qao) U 
{q}, and A4 comprises the following transitions, 


— for each (p,7,w,p’) € Aa, and a € Absy,, we have ((a, p), y, w, (a, p’)) E€ Aa 
(here p,p’ E€ QA., that is, of the form (q’,b) or (q',b, pop)), [behaviour of 
the Ag-task|] 

— for each (q',A,a) E€ Q x (Act \ {Ai}) x Absy, and b € {0,1} such that 
M q,A,0)(Q) # 9, we have ((a, (q',b)),A,A,q) € Aa, [switch to the A- 
task and reach q before switching back] 

— for each (q’, A,a) E€ Q x (Act \ {Ai }) x Abs, and (q”, 3) € Reach(q’, A, a) 
such that 8 Æ L, 

e if A # A, then we have ((a,(q',0)), A, A2A, (B, (q”,1))) € Ay and 
((a, (4',1)), A, £, (8, (¢", 1, pop))) € Aa, 

e if A = A, then we have ((a, (q’,1)), Aa, A2, (B, (q”,1))) € Aa, 
[switch to the A,-task and switch back to the Ap-task later by 
launching Ao] 

— for each (q’, A,a) € Q x (Act \ {Ai }) x Absy,, (¢”, L) € Reach(q’, A, a) and 
b € {0,1}, we have ((a, (q',b)), A, A, (0, (q”,b))) € Au, 

[switch to the A,-task and switch back to the Ao-task later when 
the A,-task becomes empty] 

— for each a € Absy,, b € {0,1} and A € Actstp U {42}, ((a, (g, b)), A, A, g) € 
Aa, [qis reached when the Aj-task is the top task] 

— for each q’ € Q and a € Absy, with q’ € a, ((a,(q’,0)), L, L, q) € Au. 

[q is reached after the Ao-task becomes empty and the Aj\-task 
becomes the top task] 


Proposition 5. Let A be an STK-dominating ASM where there are exactly two 
STK-activities A1, A2 and Aft( A2) = Aft(Ao). Then q is reachable from the 
initial configuration (qo,€) in A iff q is reachable from the initial configuration 
(0, (qo, 0)), L) in Pa. 
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6 Related Work 


We first discuss pushdown systems with multiple stacks (MPDSs) which are the 
most relevant to ASM. (For space reasons we will skip results on general push- 
down systems though.) A multitude of classes of MPDSs have been considered, 
mostly as a model for concurrent recursive programs. In general, an ASM can be 
encoded as an MPDS. However, this view is hardly profitable as general MPDSs 
are obviously Turing-complete, leaving the reachability problem undecidable. 
To regain decidability at least for reachability, several subclasses of MPDSs 
were proposed in literature: (1) bounding the number of context-switches [15], 
or more generally, phases [10], scopes [11], or budgets [3]; (2) imposing a linear 
ordering on stacks and pop operations being reserved to the first non-empty 
stack [5]; (3) restricting control states (e.g., weak MPDSs [7]). However, our 
decidable subclasses of ASM admit none of the above bounded conditions. A 
unified and generalized criterion [12] based on MSO over graphs of bounded 
tree-width was proposed to show the decidability of the emptiness problem for 
several restricted classes of automata with auxiliary storage, including MPDSs, 
automata with queues, or a mix of them. Since ASMs work in a way fairly 
different from multi-stack models in the literature, it is unclear—literally for 
us—to obtain the decidability by using bounded tree-width approach. Moreover, 
[12] only provides decidability proofs, but without complexity upper bounds. Our 
decision procedure is based on symbolic approaches for pushdown systems, which 
provides complexity upper bounds and which is amenable to implementation. 
Higher-order pushdown systems represent another type of generalization of 
pushdown systems through higher-order stacks, i.e., a nested “stack of stack” 
structure [13], with decidable reachability problems [9]. Despite apparent resem- 
blance, the back stack of ASM can not be simulated by an order-2 pushdown 
system. The reason is that the order between tasks in a back stack may be 
dynamically changed, which is not supported by order-2 pushdown systems. 
On a different line, there are some models which have addressed, for instance, 
GUI activities of Android apps. Window transition graphs were proposed for 
representing the possible GUI activity (window) sequences and their associated 
events and callbacks, which can capture how the events and callbacks modify 
the back stack [21]. However, the key mechanisms of back stacks (launch modes 
and task affinities) were not covered in this model. Moreover, the reachability 
problem for this model was not investigated. A similar model, labeled transition 
graph with stack and widget (LATTE [20]) considered the effects of launch modes 
on the back stacks, but not task affinities. LATTE is essentially a finite-state 
abstraction of the back stack. However, to faithfully capture the launch modes 
and task affinities, one needs an infinite-state system, as we have studied here. 


7 Conclusion 


In this paper, we have introduced Android stack machine to formalize the back 
stack system of the Android platform. We have also investigated the decidability 
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of the reachability problem of ASM. While the reachability problem of ASM is 
undecidable in general, we have identified a fragment, i.e., STK-dominating ASM, 
which is expressive and admits decision procedures for reachability. 


The implementation of the decision procedures is in progress. We also plan to 


consider other features of Android back stack systems, e.g., the “allowTaskRepar- 
enting” attribute of activities. A long-term program is to develop an efficient and 
scalable formal analysis and verification framework for Android apps, towards 
which the work reported in this paper is the first cornerstone. 
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Abstract. We report on a machine assisted verification of an efficient 
implementation of Montgomery Multiplication which is a widely used 
method in cryptography for efficient computation of modular exponenti- 
ation. We shortly describe the method, give a brief survey of the VeriFun 
system used for verification, present the formal proofs and report on the 
effort for creating them. Our work uncovered a serious fault in a pub- 
lished algorithm for computing multiplicative inverses based on Newton- 
Raphson iteration, thus providing further evidence for the benefit of 
computer-aided verification. 
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1 Introduction 


Montgomery Multiplication [6] is a method for efficient computation of residues 
a} mod n which are widely used in cryptography, e.g. for RSA, Diffie-Hellman, 
ElGamal, DSA, ECC etc. [4,5]. The computation of these residues can be 
seen as an iterative calculation in the commutative ring with identity Rn = 
(Nn, ®, in, ©,0,1 mod n) where n > 1, Nn = {0,...,n — 1}, addition defined by 
ab = a+b mod n, inverse operator defined by i,(a) = a-(n—1) mod n, 
multiplication defined by a © b = a -b mod n, neutral element 0 and identity 
1 mod n. 

For any m € N relatively prime to n, some m}! € Npn exists such that 
me mz! =1 mod n. mz! is called the multiplicative inverse of m in Rn and is 
used to define a further commutative ring with identity R? = (Nn, ®,in, 8,0, 
m mod n) where multiplication is defined by a ® b = a © b © m;! and identity 
given as m mod n. The multiplication & of Rẹ is called Montgomery Multipli- 
cation. 

The rings Rn and Rẹ are isomorphic by the isomorphism h : Rn > Ri? 
defined by h(a) = a © m and h™t : R™ —> Rn given by h™t(a) = a © m7}. 
Consequently a b mod n can be calculated in ring Rẹ as well because 


a-bmodn=a © b= h~ (h(a © b)) = h™' (h(a) @ h(b)). (x) 


© The Author(s) 2018 
H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 505-522, 2018. 
https://doi.org/10.1007/978-3-319-96142-2_30 
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function redc(x, z,m,n:N):N <= 
if m+ 0 
then let q := (x +n-(x-z mod m))/m in 
if n > q then q else q — n end _if 
end_let 
end_if 
function redc*(x,z,m,n,j:N):N <= 
ifm 40 
then if n 40 
then if j =0 
then m mod n 
else redc(x - redc* (x, z, m,n, (j)),z, m,n) 
end_if 
end_if 
end_if 


Fig. 1. Procedures redc and redc* implementing the Montgomery Reduction 


The required operations h,® and h~! can be implemented by the so-called 
Montgomery Reduction redc [6] (displayed in Fig. 1) as stated by Theorem 1: 


Theorem 1. Let a,b,n,m E€ N with m > n >a, n > b and n,m relatively 
prime, let I = im(n;}) and let M = m? mod n. Then I is called the Montgomery 
Inverse and (1) h(a) = redc(a - M, I, m,n), (2) a&b = redc(a - b, I, m,n), and 
(3) h-*(a) = rede(a, I, m,n). 


By (*) and Theorem 1, a : b mod n can be computed by procedure redc and 
consequently a? mod n can be computed by iterated calls of redc (implemented 
by procedure redc* of Fig. 1) as stated by Theorem 2: 


Theorem 2. Let a,n,m,I and M like in Theorem 1. Then for all j € N: 
a? mod n = redc(redc*(redc(a- M, I, m,n), I, m,n, j), I, m,n). 


By Theorem 2, j + 2 calls of redc are required for computing af mod n, viz. 
one call to map a to h(a), j calls for the Montgomery Multiplications and one 
call for mapping the result back with h~!. This approach allows for an efficient 
computation of a? mod n in R™ (for sufficient large j), if m is chosen as a power 
of 2 and some odd number for n, because x mod m then can be computed with 
constant time and x/m only needs an effort proportional to log m in procedure 
redc, thus saving the expensive mod n operations in Rn. 


1 Exponentiation is defined here with 0° = 1 so that redc(redc*(redc(0- M, I, m,n), I, 
m,n,0),I,m,n) = 1 mod n holds in particular. 
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2 About Jorim 


The truth of Theorems 1 and 2 is not obvious at all, and some number theory with 
modular arithmetic is needed for proving them. Formal proofs are worthwhile 
because correctness of cryptographic methods is based on these theorems. 


structure bool <= true, false 

structure N <= 0,t(7:N) 

structure signs <= ‘+, '—' 

structure Z <= [outfix] ( : )(sign:signs, [outfix]| : N) 


structure triple[Q@T1, @T2, @T3] <= [outfix] < : > ( [postfix]; :QT1, 


[ 
[postfix], : @T2, [postfix], : @T3 ) 


lemma z 4 0 — [x-(y mod z) = x-y] mod z <= Vx, y,z:N 
if{z = 0, (x: (y mod z) mod z) = (x-y mod z), true} 


Fig. 2. Data structures and lemmas in VeriFun 


Proof assistants like Isabelle/HOL, HOL Light, Coq, ACL2 and others have 
been shown successful for developing formal proofs in Number Theory (see e.g. 
[14]). Here we use the veriFun system? [7,10] to verify correctness of Mont- 
gomery Multiplication by proving Theorems 1 and 2. The system’s object lan- 
guage consists of universal first-order formulas plus parametric polymorphism. 
Type variables may be instantiated with polymorphic types. Higher-order func- 
tions are not supported. The language provides principles for defining data struc- 
tures, procedures operating on them, and for statements (called “lemmas” ) about 
the data structures and procedures. Unicode symbols may be used and function 
symbols can be written in out-, in-, pre- and postfix notation so that readability 
is increased by use of the familiar mathematical notation. Figure 2 displays some 
examples. The data structure bool and the data structure N for natural numbers 
built with the constructors 0 and *(...) for the successor function are the only 
predefined data structures in the system. ~(...) is the selector of *(...) thus 
representing the predecessor function. Subsequently we need integers Z as well 
which we define in Fig. 2 as signed natural numbers. For instance, the expression 
(‘—’, 42) is a data object of type Z, selector sign yields the sign of an integer (like 
‘—’ in the example), and selector |...| gives the absolute value of an integer (like 
42 in the example). Identifiers preceded by @ denote type variables, and therefore 
polymorphic triples are defined in Fig. 2. The expression <42, (‘+’, 47), (‘—’,5)> 
is an example of a data object of type triple[N, Z, Z]. The i” component of a 
triple is obtained by selector (...);. 

Procedures are defined by if- and case-conditionals, functional composition 
and recursion like displayed in Fig.1. Procedure calls are evaluated eagerly, 


? An acronym for “A Verifier for Functional Programs”. 
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i.e. call-by-value. The use of incomplete conditionals like for redc and redc* 
results in incompletely defined procedures [12]. Such a feature is required when 
working with polymorphic data structures but is useful for monomorphic data 
structures too as it avoids the need for stipulating artificial results, e.g. for 
n/0. Predicates are defined by procedures with result type bool. Procedure 
function[infix] > (x,y:N):bool <=... for deciding the greater-than relation is 
the only predefined procedure in the system. Upon the definition of a procedure, 

eriFun’ s automated termination analysis (based on the method of Argument- 
Bounded Functions [8,11]) is invoked for generating termination hypotheses 
which are sufficient for the procedure’s termination and proved like lemmas. 
Afterwards induction axioms are computed from the terminating procedures’ 
recursion structure to be on stock for future use. 

Lemmas are defined with conditionals if : bool x bool x bool — bool as the 
main connective, but negation ~ and case-conditionals may be used as well. 
Only universal quantification is allowed for the variables of a lemma. Figure 2 
displays a lemma about (the elsewhere defined) procedure mod (computing the 
remainder function) which is frequently used in subsequent proofs. The string 
in the headline (between “lemma” and “<=”) is just an identifier assigning a 
name to the lemma for reference and must not be confused with the statement 
of the lemma given as a boolean term in the lemma body. Some basic lemmas 
about equality and >, e.g. stating transitivity of = and >, are predefined in the 
system. Predefined lemmas are frequently used in almost every case study so 
that work is eased by having them always available instead of importing them 
from some proof library. 

Lemmas are proved with the HPL-calculus (abbreviating Hypotheses, Pro- 
grams and Lemmas) [10]. The most relevant proof rules of this calculus are 
Induction, Use Lemma, Apply Equation, Unfold Procedure, Case Analysis and 
Simplification. Formulas are given as sequents of form H,IH | goal, where H 
is a finite set of hypotheses given as literals, i.e. negated or unnegated predicate 
calls and equations, JH is a finite set of induction hypotheses given as partially 
quantified boolean terms and goal is a boolean term, called the goalterm of the 
sequent. A deduction in the HPL -calculus is represented by a tree whose nodes 
are given by sequents. A lemma £ with body V... goal is verified iff (i) the goal- 
term of each sequent at a leaf of the proof tree rooted in {}, {} F goal equals true 
and (ii) each lemma applied by Use Lemma or Apply Equation when building 
the proof tree is verified. The base of this recursive definition is given by lemmas 
being proved without using other lemmas. Induction hypotheses are treated like 
verified lemmas, however being available only in the sequent they belong to. 

The Induction rule creates the base and step cases for a lemma from an induc- 
tion axiom. By choosing Simplification, the system’s first-order theorem prover, 
called the Symbolic Evaluator, is started for rewriting a sequent’s goalterm 
using the hypotheses and induction hypotheses of the sequent, the definitions 
of the data structures and procedures as well as the lemmas already verified. 
This reasoner is guided by heuristics, e.g. for deciding whether to use a pro- 
cedure definition, for speeding up proof search by filtering out useless lemmas, 
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etc. Equality reasoning is implemented by conditional term rewriting with AC- 
matching, where the orientation of equations is heuristically established [13]. 
The Symbolic Evaluator is a fully automatic tool over which the user has no 
control, thus leaving the HPL-proof rules as the only means to guide the system 
to a proof. 

Also the HPL-calculus is controlled by heuristics. When applying the Verify 
command to a lemma, the system starts to compute a proof tree by choosing 
appropriate HPI-proof rules heuristically. If a proof attempt gets stuck, the user 
must step in by applying a proof rule to some leaf of the proof tree (sometimes 
after pruning some unwanted branch of the tree), and the system then takes over 
control again. Also it may happen that a further lemma must be formulated by 
the user before the proof under consideration can be completed. All interactions 
are menu driven so that typing in proof scripts is avoided (see [7,10]). 

veriFun is implemented in JAVA and installers for running the system under 
Windows, Unix/Linux or Mac are available from the web [7]. When working 
with the system, we use proof libraries which had been set up over the years 
by extending them with definitions and lemmas being of general interest. When 
importing a definition or a lemma from a library into a case study, all program 
elements and proofs the imported item depends on are imported as well. The 
correctness proofs for Montgomery Multiplication depend on 9 procedures and 96 
lemmas from our arithmetic proof library, which ranges from simple statements 
like associativity and commutativity of addition up to more ambitious theorems 
about primes and modular arithmetic. In the sequel we will only list the lemmas 
which are essential to understand the proofs and refer to [7] for a complete 
account of all used lemmas and their proofs. 


3 Multiplicative Inverses 


We start our development by stipulating how multiplicative inverses are com- 
puted. To this effect we have to define some procedure 3 : N x N — N satisfying’ 


Va,y:iN y £ 0 A ged(x,y) = 1 > [a - T(x, y) = 1] mody (1) 
Vo,y,2N y £ 0 A ged(x,y) = 1 > [z -x : F(x,y) = z| mod y (2) 
Yn, x,y, z:N y #0 A ged(xz,y) = 1 > [n+ z -x -I(z,y) =n+2z]mody. (3) 
Lemma 2 is proved with Lemma 1 and library lemma 
Yn, m, z, y:N ged(njm) = 1A |m: x£ = m- y| modn —> |x = y| modn (4) 
after instructing the system to use library lemma 
Va,y, iN z #0 > [z - (y mod z) = z - y| mod z (5) 


3 If x,y,z € Z and n E N, then n|z abbreviates z mod n = 0, where z mod n = — (|z| 
mod n) if z < 0, and [x = y] mod n stands for n|x — y. x mod n = y mod n is 
sufficient for [x = y] mod n but only necessary, if x and y have same polarity. 
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and VeriFun proves Lemma 3 automatically using Lemma 2 as well as library 
lemma 


Yn, x,y,z: N z40A |z = y| mod z — [x +n = y + n] mod z. (6) 


Multiplicative inverses can be computed straightforwardly with Euler’s ¢- 
function, where Lemma 1 then is proved with Euler’s Theorem [7,14]. But this 
approach is very costly and therefore unsuitable for an implementation of Mont- 
gomery Multiplication. 


function euclid(x, y:N):triple[N, Z, Z] <= 

if y=0 

then < x, (‘+’, 1), C+, 0) > 

else let e := ee (x mod y)), g := (e)1, s := (e)2, t := (e)s in 
case sign(s) of 
He eg, (Ate 
eg, (HY ltl, C- [sl + G/y)-[t)) > 
end_case 

end_let 
end_if 


function 3p(x,y:N):N <= 

if y £0 

then let s := (euclid(x, y))2 in 
case sign(s) of ‘+’: (|s| mod y), ‘—’: y — (|s| mod y) end_case 
end_let 

end_if 


Fig. 3. Computation of multiplicative inverses by the extended Euclidean algorithm 


3.1 Bézout’s Lemma 


A more efficient implementation of procedure J is based on Bézout’s Lemma 
stating that the greatest common divisor can be represented as a linear combi- 
nation of its arguments: 


Bézout’s Lemma 
For all x,y E€ Nsome s,t E€ Z exist such that gcd(x,y) =a-s+y-t. 


If y Æ 0, Jp(az,y) := s mod y is defined and gcd(x,y) = 1 holds, then by 
Bézout’s Lemma |z- Jp(x, y) = x- (s mod y) =x:s =x-s+y:t= 1] mod y. To 
implement this approach, the integer s need to be computed which can be per- 
formed by the extended Euclidean algorithm displayed in Fig. 3. This approach 
is more efficient because a call of euclid(x,y) (and in turn of Jg(x,y) given as 
in Fig. 3) can be computed in time proportional to (log y)? if x < y, whereas 
the use of Euler’s ¢-function needs time proportional to 2!°9¥ in the context of 
Montgomery Multiplication (as ¢(2*+1) = 2%). 

However, s € Z might be negative so that y + (s mod y) € N instead of 
s mod y then must be used as the multiplicative inverse of x because the carriers 
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lemma Bézout’s Lemma #1 <= Yx, y: N 


let e := euclid(x, y), g := (e)1, s := (e)2, t := (e)z in (7) 
case sign(s) of ‘+’:x-|s| =y-|t|+g, —:x-|s|+ g =y-|t] end_case 
end_let 

lemma Bézout’s Lemma #2 <= V x, y : N (euclid(x, y))1 = gcd(x, y) . (8) 


Fig. 4. Bézout’s Lemma 


of the rings R, and Rẹ are subsets of N. We therefore define Jg as shown in 
Fig. 3 which complicates the proof of Lemma 1 (with J replaced by Jg) as this 
definition necessitates a proof of [x-y+a-(s mod y) = 1] mod y if s < 0. 

Bézout’s Lemma is formulated in our system’s notation by the pair of lemmas 
displayed in Fig. 4. When prompted to prove Lemma 7, the system starts a Peano 
induction upon x but gets stuck in the step case. We therefore command to use 
induction corresponding to the recursion structure of procedure euclid. veriFun 
responds by proving the base case and simplifying the induction conclusion in 
case sign(s) =‘+’ to 


y #0 > a-lé] +g = (x mod y): |t|+g+ |t|: (y= 1): (x/y)+ |t|: (2/4) (i) 


(where e abbreviates euclid(y, (x mod y)), g := (e)1, s := (e)2 and t := (e)s) 
using the induction hypothesis 


Va’: N let{e := euclid(x', (x mod y)), g := (e)1, s := (e)2, t := (e)3; 
case{sign(s); 
‘+? : g'-|8|= (x mod y)- |t| +9, 
‘—: |s| +g= (z mod y): |t|} 


and some basic arithmetic properties. We then instruct the system to use the 
quotient-remainder theorem for replacing x at the left-hand side of the equation 
in (i) by (x/y)-y+(a mod y) causing VeriFun to complete the proof. The system 
computes a similar proof obligation for case sign(s) = ‘—’ which is proved in the 
same way. 

By “basic arithmetic properties” we mean well known facts like associativity, 
commutativity, distributivity, cancellation properties etc. of +,—,-,/,gcd,... 
which are defined and proved in our arithmetic proof library. These facts are 
used almost everywhere by the Symbolic Evaluator so that we will not mention 
their use explicitly in the sequel. 

When called to prove Lemma 8 by induction corresponding to the recursion 
structure of procedure euclid, veriFun responds by proving the base case and 
rewrites the step case with the induction hypothesis to 


y 0 > ged(x,y) = ged(y, (x mod y)). (ii) 
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It then automatically continues with proving (ii) by induction corresponding 
to the recursion structure of procedure gcd where it succeeds for the base and 
the step case. Lemma 8 is useful because it relates procedure euclid to procedure 
gcd of our arithmetic proof library so that all lemmas about gcd can be utilized 
for the current proofs. 

For proving the inverse property 


Vz, y:N y # 0 A ged(x,y) =1 > [x- Ia(x,y) = 1] mod y (9) 


of procedure Jg, we call the system to unfold procedure call J,(2, y). veriFun 
responds by proving the statement for case sign(s) = ‘+’ using Bézout’s Lemma 7 
and 8 and the library lemmas 


Va,y,2.N z40Az|a- [x+y =y| modz (10) 
Vz,yNy#0>yl|a-y (11) 

as well as (5), but gets stuck in the remaining case with proof obligation 
y # 0A sign(s) = ‘—’ A g=1—> |[x@-y-2-(|s| mod y)=1] mody (üi) 


where g abbreviates (euclid(x,y))1 and s stands for (euclid(x, y))2. Proof obli- 
gation (iii) represents the unpleasant case of the proof development and necessi- 
tates the invention of an auxiliary lemma for completing the proof. After some 
unsuccessful attempts, we eventually came up with lemma 


Va,y,z,uNy A0Ay|(a@-z+u)Anr>u— [x-y—a2-(z mod y) =ulmod y. (12) 


For proving (iii), we command to use Lemma 12 for replacing the left-hand side 
of the congruence in (iii) by g, and veriFun computes 


y £0A sign(s) = ‘—’ A g=1—> 
(æ > g => y | (£ - |s| + 9)) A 
(u<g— |x-y-— z- (|s| mod y) = 1] mod y. (iv) 


Now we can call the system to use Bézout’s Lemma 7 for replacing «< - |s| + g 
in (iv) by y- |t| causing VeriFun to complete the proof with Bézout’s Lemma 8 
and library lemma (11) in case of x > g and otherwise showing that x < g = 1 
entails x = 0 and 1 = g = gcd(0,y) = y in turn, so that x- y —a- (|s| mod y) 
simplifies to 0 and [0 = 1] mod y rewrites to true. 

It remains to prove auxiliary lemma (12) for completing the proof of Lemma 9: 
After being called to use library lemma* 


Vo,y,2N z240Az|(@-—y)Az| (y-—2) > [x =y] mod z (13) 


* At least one of z|(x — y) or z|(y — x) holds trivially because subtraction is defined 
such that a — b = 0 iff a < b. 
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for replacing the left-hand side of the congruence in (12) by u, veriFun computes 


yFOAy|(a-ztu)Ar>zu>y|(u-(@-y—a#-(z mod y))) (v) 


with the library lemmas (11) and 
Va,y,2iN 240A [x =y] mod z > z | (x — y) (14) 


Va,y,z,uNn40- [a+ y-(z mod n)=xz+y:z] modn. (15) 


We then command to use library lemma Va,y,2:.Nz40Anar<yroars<y-z 
(with u substituted for x, x for y and y — (z mod y) for z) after x factoring out, 


causing veriFun to prove (v) with the synthesized lemma? 


Va,y:N y #0 > y > (x mod y). (16) 


function Jn: (x,k:N):N <= 

if2>k 

then k 

else let h := [k/2]; r := IJn: ((x mod 27 h),h); y := 2 Î k in 
(2-r + ((r-r mod y)-x mod y) mod y) 
end_let 

end_if 


function Jn(x,y:N):N <= if y #0 then y — Jy (x, logs(y)) end_if 


Fig. 5. Computation of multiplicative inverses by Newton-Raphson iteration 


3.2 Newton’s Method 


Newton-Raphson iteration is a major tool in arbitrary-precision arithmetic and 
efficient algorithms for computing multiplicative inverses are developed in combi- 
nation with Hensel Lifting [2]. Figure 5 displays an implementation by procedure 
Jy for odd numbers x and powers y of 2 (where | computes exponentiation 
satisfying 0 f? 0 = 1). Procedure Jy is defined via procedure Jy, which is 
obtained from [3], viz. Algorithm 2’ Recursive Hensel, where however ‘—’ instead 
of ‘+’ is used in the result term. Algorithm 2’ was developed to compute a mul- 
tiplicative inverse of x modulo p? for any x not dividable by a prime p and 
returns a negative integer in most cases. By replacing ‘—’ with ‘+’, all calcula- 
tions can be kept within N so that integer arithmetic is avoided. As procedure 
Jn computes the absolute value of a negative integer computed by Algorithm 
2’, one additional subtraction is needed to obtain a multiplicative inverse which 
is implemented by procedure Jy. The computation of Jy(zx,2") only requires 
log k steps (compared to k? steps for Jp(x,2")), and therefore Jy is the method 
of choice for computing a Montgomery Inverse. 


5 Synthesized lemmas are a spin-off of the system’s termination analysis. 
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However, Algorithm 2’ is flawed so that we wasted some time with our 
verification attempts: The four mod-calls in the algorithm are not needed for 
correctness, but care for efficiency as they keep the intermediate numbers small. 
Now instead of using modulus 2* for both inner mod-calls, Algorithm 2’ calcu- 
lates mod 2!*/?1 thus spoiling correctness. As the flawed algorithm cares for even 
smaller numbers, the use of mod 2!*/?1 could be beneficial indeed, and there- 
fore it was not obvious to us whether we failed in the verification only because 
some mathematical argumentation was missing. But this consideration put us on 
the wrong track. Becoming eventually frustrated by the unsuccessful verification 
attempts, we started VeriFun’s Disprover {1] which—to our surprise—came up 
with the counter example x = 3, k = 2 for Lemma 17 in less than a second.® We 
then repaired the algorithm as displayed in Fig.5 and subsequently verified it 
(cf. Lemma 20). Later we learned that the fault in Algorithm 2’ has not been 
recognized so far and that one cannot do better to patch it as we did.’ 

For proving the inverse property (20) of procedure Jy, we first have to verify 
the correctness statement 


Va, k:N24 a > (x: Jy (z, k) mod 2) = 2 —1 (17) 


for procedure Jy’: We call the system to use induction corresponding to the 
recursion structure of procedure Jy which provides the induction hypothesis 


VLN k >2A2ta' > (a! -Iy (x', [k/2]) mod 2!*/21) = 21/1 1. (18) 
vériFun proves the base case, but gets stuck in the step case with 


k>2A2ta—> 
(x: (2A + (x- (A? mod 2") mod 2") mod 2") mod 2") =2 -1 (ù 


where A stands for Jy ((a mod 2!*/?1), [k/2]). By prompting the system to use 
Lemma 5, proof obligation (i) is simplified to 


k>2A2ta—(2B+ B? mod 2") =2"-1 (ii) 


(where B abbreviates x - A) thus eliminating the formal clutter resulting from 
the mod-calls in procedure Jy. Next we replace 2B + B? by (B+ 1)? —1 and 
then call the system to replace B by (B/C) -C + R where C = 2!*/?! and R= 
((a mod C) - A mod C), which is justified by the quotient-remainder theorem as 
R rewrites to (B mod C) by library lemma (5). This results in proof obligation 


k>2A2{az— (((B/C)-C+R+1)? —1 mod 2) =2* -1 (iii) 


6 The Disprover is based on two heuristically controlled disproving calculi, and its 
implementation provides four selectable execution modes (Fast Search, Extended 
Search, Simple Terms and Structure Expansion). For difficult problems, the user 
may support the search for counter examples by presetting some of the universally 
quantified variables with general terms or concrete values. 

T Personal communication with Jean-Guillaume Dumas. 
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and we command to use the induction hypothesis (18) for replacing R in (iii) by 
C —1. VeriFun then responds by computing 

k>2A2{x2— (((B/C)-C+C)? —1 mod 2) =2* -1 (iv) 
using library lemmas Vz, y, z:N y 4 OAz 4 OAz | y > [(a@ mod y) = x] mod z and 


(5) to prove 2 + (x mod 2!*/?1) for justifying the use of the induction hypothesis. 
When instructed to factor out C in (iv), the system computes 


k>272łx = (2) (B/C 41)? — 1 mod 2) = 2 = 1. (v) 
We command to use library lemma 
Va,y,2Nz240AztaAz|yAy>u- (y—amodz) =2z—-—(xmodz) (19) 
for replacing the left-hand side of the equation in (v) yielding 
k>2A2ta — 2" —(1 mod 2") = 2* -1 (vi) 
justified by proof obligation 
k>2A2ta—- 
2k AOA2* {1A 2" | (2/*/71)?. (B/C +1)? A (2!*/71)?. (B/C +1)? > 1 
which VeriFun simplifies to 
keTA2 e323" | (21/21). (B/C +1} (vii) 
in a first step. It then uses auxiliary lemma Va:N x < 2- [a/2] and the library 
lemmas (11) and Yz, y,z:N x #0 ^z < y — x” | x” for rewriting (vii) subse- 
quently to true. Finally the system simplifies (vi) to true as well by unfolding 


the call of procedure mod, and Lemma 17 is proved. 
When called to verify the inverse property 


Va, y:N2{ aA 2?(y) > [x-In(a,y) = 1] mod y (20) 


of procedure Jy (where 2?(y) decides whether y is a power of 2), VeriFun unfolds 
the call of procedure Jy and returns 
y>2A2teA2(y) > (x@-y—2-Iyi(a, loge(y)) mod y) = 1. (viii) 
Now we instruct the system to use library lemma (19) for replacing the left-hand 
side of the equation in (viii), and VeriFun computes 
y> 224r AZ (y) —> 
(x: In: (x, logə(y)) mod y) #0 ^y- (z: In (x, loga(y)) mod y)=1 (ix) 
using auxiliary lemma Yz, y:N 27(y) > y > Jy (x, logə(y)) and the library lem- 
mas (11), (14) and 
Va,y,2Ne-y>u-z>y>z. (21) 
Finally we let the system use library lemma Vx:N 2?(ax) — 2!¢92(*) = x to replace 
both moduli y in (ix) by 21920) causing VeriFun to rewrite both occurrences 


of (x - Iyn (x,log2(y)) mod y) with Lemma 17 to y — 1 and proof obligation (ix) 
to true in turn, thus completing the proof of (20). 
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function i(x, y:N):N <= if y £0 then (x- (y) mod y) end_if 

function h(x,m,n:N):N <= if n 40 then (x-m mod n) end_if 

function @(x,y,m,n:N):N <= if n 40 then (x-y-3(m,n) mod n) end_if 
function h—*(x,m,n:N):N <= if n Æ 0 then (x -3(m,n) mod n) endif 
function 3(x,y:N):N <= if 2’(y) then Jn(x, y) else Jg (x,y) end_if 


Fig. 6. Procedures for verifying Montgomery Multiplication 


4 Correctness of Montgomery Multiplication 


We continue by defining procedures for computing the functions i, h, & and h7! 
as displayed in Fig. 6, where we write i(x, y) instead of i,(x) in the procedures 
and lemmas. As we aim to prove correctness of Montgomery Multiplication 
using procedure Jy for computing the Montgomery Inverse with minimal costs, 
24n A2?(m) instead of gcd(n,m) = 1 must be demanded to enable the use 
of Lemma 20 when proving the statements of Theorems 1 and 2. However, the 
multiplicative inverses n}! and m7} both are needed in the proofs (whereas only 
n7! is used in applications of redc and redc*). Consequently procedure Jy can- 
not be used in the proofs as it obviously fails in computing m}! (except for 
case n = m = 1, of course). This problem does not arise if procedure Jpg is 
used instead, where gcd(n,m) = 1 is demanded, because Jg(n, m) = n7} and 
Jg(m, n) = m}! for any coprimes n and m by Lemma9. The replacement of Jg 
by Jy when computing the Montgomery Inverse then must be justified after- 


wards by additionally proving 
Va, y:N 242A 2"(y) > Ip(a,y) = In(z,y). (22) 


However, proving (22) would be a complicated and difficult enterprise because 
the recursion structures of procedures euclid and Jy differ significantly. But we 
can overcome this obstacle by a simple workaround: We use procedure J of Fig. 6 
instead of Jg in the proofs and let the system verify the inverse property 


Va,y:N y 0A ged(x,y) = 1 > [x - F(x, y) = 1] mod y (i) 


of procedure J before: VeriFun easily succeeds with library lemma (4) and the 
inverse property (9) of procedure Jpg after being instructed to use library lemma 
Voe,y,uNn >2An|yA ged(2,y) =1— n {x and the inverse property (20) of 
procedure Jy. Consequently 3(n,m) = n7} and 3(m,n) = m}! for any coprimes 
n and m, and therefore J can be used in the proofs. The use of Jy instead of J 
when computing the Montgomery Inverse is justified afterwards with lemma 


Va, y:N 2° (y) > I(x, y) = In (x,y) 


having an obviously trivial (and automatic) proof. 
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Central for the proofs of Theorems 1 and 2 is the key property 


Ym, n,oN m>nAn-m>x ged(n, 
redc(x,i(3(n, m), m), m, n) 


Il 3 
I 


(23) 


of procedure redc: For proving Theorem 1.1 


Ym,n,aNm>n>ad ged(n,m)=1—> 
h(a,m,n) = redc(a-(m-m mod n), i(J(n, m), m), m,n) (Thm 1.1) 
we command to use (23) for replacing the right-hand side of the equation by 
(a: (m-m mod n)-3(m,n) mod n). The system then replaces the left-hand side 
of the equation with a-m mod n by unfolding procedure call h(a, m,n) and 


simplifies the resulting equation to true with Lemma 2, the synthesized lemma 
(16) and the library lemmas (5) and 


Vz, y, u, uN zr >y^u>v—>zr- u>yv. (24) 
Theorems 1.2 and 1.3, viz. 


Ym,n,a,oNm>n>aAn>bAgcd(n,m) =1 
— @(a,b,m,n) = redc(a- b, i(3(n, m), m), m, n) (Thm 1.2) 


Ym, n,a:N m >n >an ged(n,m) =1 
> h(a, m,n) = redc(a, i(3(n, m), m), m, n) (Thm 1.3) 


are (automatically) proved in the same way. 

Having proved Theorem1, it remains to verify the key property (23) for 
procedure redc (before we consider Theorem 2 subsequently). We start by proving 
that division by m in Rn can be expressed by J: We call the system to prove 


Ym, n,a: Nn #0Am |x ged(n,m) =1— [a/m=a2-3(m,n)| modn (25) 


and VeriFun automatically succeeds with Lemma2 and the library lemmas (4) 
and Va,y,z:Ny #0 Ay | z > (xz/y) -y= z. 

As a consequence of Lemma 25, the quotient q in procedure redc can be 
expressed in Rn by J in particular (if redc is called with the Montgomery Inverse 
as actual parameter for the formal parameter z), which is stated by lemma 


Ym,n,u:N n 40 A ged(n,m) = 1 
= [(a@+n-(x-i(G(n,m),m) mod m))/m=x-3(m,n)| modn. (26) 


For obtaining a proof, we command to use Lemma 25 for replacing the left-hand 
side of the congruence in (26) by (a+ n - (x-i(3(n,m),m) mod m)) - 3(m,n) 
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causing veriFun to complete the proof using Lemma3 as well as the library 
lemmas (5), (10), (11), (15) and Vz,y:N y Æ 0 —> y | (x + (y — 1) - x). 

An obvious correctness demand for the method is that each call of redc (under 
the given requirements) computes some element of the residue class mod n. This 
is guaranteed by the conditional subtraction of n from the quotient q in the body 
of procedure redc. However, at most one subtraction of n from q results in the 
desired property only if n+ n > q holds, which is formulated by lemma 

Ym, n, N m-n >xz—>n+n>(x+n-(x-i(I(n,m), m) mod m))/m. (27) 


We prompt the system to use a case analysis upon m : (n+n) >a+n-(a- 
i(3(n,m),m) mod m) causing veriFun to prove the statement in the positive 
case with the library lemmas (5) and Vz, y, z:N z-z > y > x > y/z and to verify 
it in the negative case with the synthesized lemma (16) and the library lemmas 
(5), (21) and Va,y,u:Na>yAu>v>at+ur>ytu. 
Now the mod n property of procedure redc can be verified by proving lemma 
Ym n, Nm >nAn:m > gr^ ged(n,m) = 1 —> 
redc(x, i(3(n,m),m),m,n) = (redc(x,i(F(n,m),m),m,n) mod n). (28) 
We let the system unfold the call of procedure mod in (28) causing VeriFun to 
use the synthesized lemma (16) for computing the simplified proof obligation 
m>nAn-m>xv ged(n,m) = 1 —> n> redc(x,i(F(n,m),m),m,n). (i) 
Then we command to unfold the call of procedure redc which simplifies to 
m>nAn-m> x2 ged(n,m)=1A 
(a+n-(x-i(F(n,m),m) mod m))/m>n 
>n > (x+n: (x-i(I(n,m), m) mod m))/m—n. (ii) 
Finally we let the system use library lemma Yz, y,z:N r >yAy>z->4-2z> 
y — z resulting in proof obligation 
m>nAn-m>xgcd(n,m) =1 
A(a+n-: (a -i(3(n,m),m) mod m))/m > n 
[n+n > (x+n. (x-i(I(n,m), m) mod m))/mA 
A (x+n. (x-i(I(n,m), m) mod m))/m > n 
> (ntn)—n>(x@+n- (x -i(F(n,m),m) mod m))/m-=n] 


>n>(x+n-(x-i(3(n,m),m) mod m))/m—n (iii) 


which simplifies to 
m>nAn-m>xAgced(n,m) =1 
A (x+n. (x-i(I(n,m), m) mod m))/m >n 
A(n+n)—n> (x+n. (x-i(I(n,m), m) mod m))/m—n 


>n > (x+n: (x-i(I(n,m), m) mod m))/m— n (iv) 


by Lemma 27 and to true in turn using the plus-minus cancellation. 
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Now all lemmas for proving the key lemma (23) are available: We demand 
to use Lemma 28 for replacing the left-hand side of the equation in (23) by 
(redc(a,i(3(n,m),m),m,n) mod n) and to apply lemma (26) for replacing the 
right-hand side by ((x +n- (x-i(J(n,m),m) mod m))/m mod n) resulting in the 
simplified proof obligation 


m>nAn-m>xAged(n,m)=1—> 


[redc(x,i(3(n,m),m),m,n) = (x +n- (x -i(I(n,m), m) mod m))/m] mod n. 


(v) 


Then we unfold the call of procedure redc causing the system to prove (v) with 
library lemma (5). 
Having proved the key lemma (23), the proof of Theorem 2 


Ym, n,a, jN m >n >an gced(n,m) = 1 —> 
(a? mod n) = redc(redc*(redc(a - M, I, m,n), I, m,n, j), I,m,n) (Thm 2) 


(where M = ((m-m) mod n) and I = i(J(n, m), m)) is easily obtained by support 
of a further lemma, viz. 


Ym, n,a, jN m>n>ad gcd(n,m) = 1 —> 
(m-a mod n) = rede*(redc(a- M, I, m,n), I, m,n, j). (29) 


When called to use Peano induction upon j for proving (29), vériFun proves 
the base case and rewrites the step case with the induction hypothesis to 


m>n>aNgcd(n,m)=1Aj3 40> 
(m-a! -a mod n) = redc(redc(a- M,I,m,n)-(m-a?~' mod n),I,m,n). (vi) 
Then we command to replace both calls of redc with the key lemma (23) causing 
veriFun to succeed with the lemmas (2), (5), (16) and (24). 

Finally the system proves (Thm 2) using lemmas (2), (5), (16), (29) and 
library lemma Yz, y, z:N x # 0 ^y >z —> z-y > z after being prompted to use 
(Thm 1.3) for replacing the right-hand side of the equation in (Thm 2). 


5 Discussion and Conclusion 


We presented machine assisted proofs verifying an efficient implementation of 
Montgomery Multiplication, where we developed the proofs ourselves as we are 
not aware of respective proofs published elsewhere. Our work also uncovered 
a serious fault in a published algorithm for computing multiplicative inverses 
based on Newton-Raphson Iteration [3], which could have dangerous conse- 
quences (particularly when used in cryptographic applications) if remained 
undetected. 
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| Proc. | Lem. | Rules | User | System | % | Steps|mm:ss 
Jg(n, m) = n; | 8 (7)| 49 (3)| 241 (39) | 36 (3) | 205 (36) | 85, 1 (92,3) |3171 | 0:19 
In (n,m) = nz |10 (9) | 76 (3)| 368 (59) | 59 (3) | 309 (56) | 84, 0 (94, 9) | 6692] 1:32 
Theorems 1 & 2 |20 (12)| 116 (3)| 547 (78) | 96 (6) | 451 (72) | 82, 4 (92, 3) | 9739 | 2:19 


Fig. 7. Proof statistics 


Figure 7 displays the effort for obtaining the proofs (including all procedures 
and lemmas which had been imported from our arithmetic proof library). Co- 
lumn Proc. counts the number of user defined procedures (the recursively defined 
ones given in parentheses), Lem. is the number of user defined lemmas (the 
number of synthesized lemmas given in parentheses), and Rules counts the total 
number of HPLI-proof rule applications, separated into user invoked (User) and 
system initiated (System) ones (with the number of uses of Induction given in 
parentheses). Column % gives the automation degree, i.e. the ratio between Sys- 
tem and Rules, Steps lists the number of first-order proof steps performed by the 
Symbolic Evaluator and Time displays the runtime of the Symbolic Evaluator.® 

The first two rows show the effort for proving Lemmas 9 and 20 as illustrated 
in Sect.3. As it can be observed from the numbers, verifying the computation of 
multiplicative inverses by Newton-Raphson Iteration is much more challenging 
for the system and for the user than the method based on Bézout’s Lemma. 
Row Theorems 1 and 2 below displays the effort for proving Theorems 1 and 
2 as illustrated in Sect.4 (with the effort for the proofs of Lemmas 9 and 20 
included). 

The numbers in Fig. 7 almost coincide with the statistics obtained for other 
case studies in Number Theory performed with the system (see e.g. [14] and also 
[7] for more examples), viz. an automation degree of ~85% and a success rate 
of ~95% for the induction heuristic. All termination proofs (hence all required 
induction axioms in turn) had been obtained without user support, where 6 of 
the 12 recursively defined procedures, viz. mod, /, gcd, logs, euclid and Jy, do 
not terminate by structural recursion.? While an automation degree up to 100% 
can be achieved in mathematically simple domains, e.g. when sorting lists [7,9], 
values of 85% and below are not that satisfying when concerned with automated 
reasoning. The cause is that quite often elaborate ideas for developing a proof are 
needed in Number Theory which are beyond the ability of the system’s heuristics 
guiding the proof search.'° We also are not aware of other reasoning systems 
offering more machine support for obtaining proofs in this difficult domain. 


8 Time refers to running VeriFun 3.5 under Windows 7 Enterprise with an INTEL 
Core i7-2640M 2.80 GHz CPU using JAVA 1.8.0_45. 

? Procedure 2’(...) is not user defined, but synthesized as the domain procedure [12] 
of the incompletely defined procedure log,. 

10 Examples are the use of the quotient-remainder theorem for proving (i) in Sect. 3.1 
and (iii) in Sect. 3.2 which are the essential proof steps there although more complex 
proof obligations result. 
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From the user’s perspective, this case study necessitated more work than 
expected, and it was a novel experience for us to spend some effort for verifying a 
very small and non-recursively defined procedure. The reason is that correctness 
of procedure redc depends on some non-obvious and tricky number theoretic 
principles which made it difficult to spot the required lemmas. In fact, almost 
all effort was spend for the invention of the auxiliary lemmas in Sect.4 and 
of Lemma 12 in Sect.3.1. Once the “right” lemma for verifying a given proof 
obligation eventually was found, its proof turned out to be a routine task. The 
proof of Lemma 17 is an exception as it required some thoughts to create it and 
some effort as well to lead the system (thus spoiling the proof statistics). Proof 
development was significantly supported by the system’s Disprover [1] which 
(besides detecting the fault in Algorithm 2’) often helped not to waste time with 
trying to prove a false conjecture, where the computed counterexamples provided 
useful hints how to debug a lemma draft. 
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Abstract. Delay differential equations are fundamental for modeling 
networked control systems where the underlying network induces delay 
for retrieving values from sensors or delivering orders to actuators. They 
are notoriously difficult to integrate as these are actually functional equa- 
tions, the initial state being a function. We propose a scheme to compute 
inner and outer-approximating flowpipes for such equations with uncer- 
tain initial states and parameters. Inner-approximating flowpipes are 
guaranteed to contain only reachable states, while outer-approximating 
flowpipes enclose all reachable states. We also introduce a notion of 
robust inner-approximation, which we believe opens promising perspec- 
tives for verification, beyond property falsification. The efficiency of our 
approach relies on the combination of Taylor models in time, with an 
abstraction or parameterization in space based on affine forms, or zono- 
topes. It also relies on an extension of the mean-value theorem, which 
allows us to deduce inner-approximating flowpipes, from flowpipes outer- 
approximating the solution of the DDE and its Jacobian with respect to 
constant but uncertain parameters and initial conditions. We present 
some experimental results obtained with our C++ implementation. 


1 Introduction 


Nowadays, many systems are composed of networks of control systems. These 
systems are highly critical, and formal verification is an essential element for 
their social acceptability. When the components of the system to model are 
distributed, delays are naturally introduced in the feedback loop. They may 
significantly alter the dynamics, and impact safety properties that we want to 
ensure for the system. The natural model for dynamical systems with such delays 
is Delay Differential Equations (DDE), in which time derivatives not only depend 
on the current state, but also on past states. Reachability analysis, which involves 
computing the set of states reached by the dynamics, is a fundamental tool for the 
verification of such systems. As the reachable sets are not exactly computable, 
approximations are used. In particular, outer (also called over)-approximating 
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flowpipes are used to prove that error states will never be reached, whereas 
inner (also called under)-approximating flowpipes are used to prove that desired 
states will actually be reached, or to falsify properties. We propose in this article 
a method to compute both outer- and inner-approximating flowpipes for DDEs. 
We concentrate on systems that can be modeled as parametric fixed-delay 
systems of DDEs, where both the initial condition and right-hand side of the 
system depend on uncertain parameters, but with a unique constant and exactly 
known delay: 
2(t) = f(z(t),2(t-—7),B) ift € [to +7,T7] (1) 
z(t) = zo(t, 8) if t € [to, to +7] 


where the continuous vector of variables z belongs to a state-space domain D C 
R”, the (constant) vector of parameters 3 belongs to the domain B C R™, and 
f:DxDx B — D is C® and such that Eq. (1) admits a unique solution! 
on the time interval [to, T]. The initial condition is defined on t € fto, to + 7] 
by a function zọ : Rt x B — D. The method introduced here also applies in 
the case when the set of initial states is given as the solution of an uncertain 
system of ODEs instead of being defined by a function. Only the initialization of 
the algorithm will differ. When several constant delays occur in the system, the 
description of the method is more complicated, but the same method applies. 


Example 1. We will exemplify our method throughout the paper on the system 


x(t) = xo(t, 8) = (1 + Gt)? t € [—7,0] 


t = —2(t)-2(t— 7) =: f (x(t), x(t- 7), 8) te [0,7] 


We take 8 € [4; 1] , which defines a family of initial functions, and we fix 7 = 1. 
This system is a simple but not completely trivial example, for which we 
have an analytical solution on the first time steps, as detailed in Example 4. 


Contributions and Outline. In this work, we extend the method introduced by 
Goubault and Putot [16] for ODEs, to the computation of inner and outer flow- 
pipes of systems of DDEs. We claim, and experimentally demonstrate with our 
prototype implementation, that the method we propose here for DDEs is both 
simple and efficient. Relying on outer-approximations and generalized interval 
computations, all computations can be safely rounded, so that the results are 
guaranteed to be sound. Finally, we can compute inner-approximating flowpipes 
combining existentially and universally quantified parameters, which offers some 
strong potential for property verification, beyond falsification. 

In Sect. 2, we first define the notions of inner and outer-approximating flow- 
pipes, as well as robust inner-approximations, and state some preliminaries on 
generalized interval computations, which are instrumental in our inner flowpipes 
computations. We then present in Sect.3 our method for outer-approximating 


' We refer the reader to [12,27] for the conditions on f. 
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solutions to DDEs. It is based on the combination of Taylor models in time 
with a space abstraction relying on zonotopes. Section 4 relies on this approach 
to compute outer-approximations of the Jacobian of the solution of the DDE 
with respect to the uncertain parameters, using variational equations. Inner- 
approximating tubes are obtained from these using a generalized mean-value 
theorem introduced in Sect.2. We finally demonstrate our method in Sect. 5, 
using our C++ prototype implementation, and show its superiority in terms of 
accuracy and efficiency compared to the state of the art. 


Related Work. Reachability analysis for systems described by ordinary differ- 
ential equations, and their extension to hybrid systems, has been an active 
topic of research in the last decades. Outer-approximations have been dealt 
with ellipsoidal [20], sub-polyhedral techniques, such as zonotopes or sup- 
port functions, and Taylor model based methods, for both linear and non- 
linear systems [2,4—6, 10,14, 17,26]. A number of corresponding implementations 
exist [1,3,7,13,22,25,29]. Much less methods have been proposed, that answer 
the more difficult problem of inner-approximation. The existing approaches use 
ellipsoids [21] or non-linear approximations [8,16, 19,31], but they are often com- 
putationally costly and imprecise. Recently, an interval-based method [24] was 
introduced for bracketing the positive invariant set of a system without rely- 
ing on integration. However, it relies on space discretization and has only been 
applied successfully, as far as we know, to low dimensional systems. 

Taylor methods for outer-approximating reachable sets of DDEs have been 
used only recently, in [28,32]. We will demonstrate that our approach improves 
the efficiency and accuracy over these interval-based Taylor methods. 

The only previous work we know of for computing inner-approximations of 
solutions to DDEs, is the method of Xue et al. [30], extending the approach 
proposed for ODEs in [31]. Their method is based on a topological condition and 
a careful inspection of what happens at the boundary of the initial condition. 
We provide in the section dedicated to experiments a comparison to the few 
experimental results given in [30]. 


2 Preliminaries on Outer and Inner Approximations 


Notations and Definitions. Let us introduce some notations that we will 
use throughout the paper. Set valued quantities, scalar or vector valued, corre- 
sponding to uncertain inputs or parameters, are noted with bold letters, e.g æ. 
When an approximation is introduced by computation, we add brackets: outer- 
approximating enclosures are noted in bold and enclosed within inward fac- 
ing brackets, e.g. [a], and inner-approximations are noted in bold and enclosed 
within outward facing brackets, e.g. Jæ[. 

An outer-approximating extension of a function f : R™ — R” is a func- 
tion [f] : P(R™) — PR”), such that for all x in P(R™), range(f,x2) = 
{f(z),« € x} C [f](#). Dually, inner-approximations determine a set of val- 
ues proved to belong to the range of the function over some input set. An 
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inner-approximating extension of f is a function |f|: P(R™) — P(R"), such 
that for all x in P(R™), |f[(x) C range(f,a). Inner and outer approxima- 
tions can be interpreted as quantified propositions: range(f,a) C [z] can be 
written (Yx € a) (Sz € [z]) (f(x) = z), while ]z[C range(f,a) can be written 
(Wz €]z2[) Gz € æ) (f(x) = 2). 

Let y(t, 8) for time t > tp denote the time trajectory of the dynamical system 
(1) for a parameter value 8, and z(t, 3) = {y(t, 3), 6 € B} the set of states 
reachable at time t for the set of parameter values 3. We extend the notion of 
outer and inner-approximations to the case where the function is the solution 
y(t, B) of system (1) over the set 3. An outer-approximating flowpipe is given by 
an outer-approximation of the set of reachable states, for all t in a time interval: 


Definition 1 (Outer-approximation). Given a vector of uncertain (con- 
stant) parameters or inputs B € B, an outer-approxzimation at time t of 
the reachable set of states, is [z](t,8) D z(t, 8), such that (Y8 € B) (Az € 
[2] (t, B)) (Y(t, p) = 2). 


Definition 2 (Inner-approximation). Given a vector of uncertain (con- 
stant) parameters or inputs 3 € B, an inner-approximation at time t of the reach- 


able set, is |z|(t, B) C z(t, B) such that (Vz €]z[(t, B)) (AG € B) (v(t, B) = z). 


In words, any point of the inner flowpipe is the solution at time t of system (1), 
for some value of @ € @. If the outer and inner approximations are computed 
accurately, they approximate with arbitrary precision the exact reachable set. 

Our method will also solve the more general robust inner-approximation 
problem of finding an inner-approximation of the reachable set, robust to uncer- 
tainty on an uncontrollable subset 84 of the vector of parameters (3: 


Definition 3 (Robust inner-approximation). Given a vector of uncertain 
(constant) parameters or inputs B = (BA, Be) € B, an inner-approximation of the 
reachable set z(t, B) at time t, robust with respect to B4, is a set |z[ y(t, B4, Be) 
such that (Vz €]z[a(t, Ba, Be)) VBA € Ba) Gbe € Be) (elt, Ba, Be) = 2). 


Outer and Inner Interval Approximations. Classical intervals are used in 
many situations to rigorously compute with interval domains instead of reals, 
usually leading to outer-approximations of function ranges over boxes. We denote 
the set of classical intervals by IR = {[z,z], x € R,& € R,x < z}. Intervals are 
non-relational abstractions, in the sense that they rigorously approximate inde- 
pendently each component of a vector function f. We thus consider in this section 
a function f : R™ — R. The natural interval extension consists in replacing 
real operations by their interval counterparts in the expression of the function. 
A generally more accurate extension relies on a linearization by the mean-value 
theorem. Suppose f is differentiable over the interval x. Then, the mean-value 
theorem implies that (Yzo € x) (Vx € a) (dc € x) (f(x) = f(ao) +f’ (c)(@—20)). 
If we can bound the range of the gradient of f over æ, by [f’](x), then we can 
derive the following interval enclosure, usually called the mean-value extension: 
for any xo € x, range(f,a) C f(xo) + [f’](x)(x — zo). 
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Example 2. Consider f(x) = x?—2, its range over x 
interval extension of f, evaluated on [2,3], is [f]([2, 
The mean-value extension gives f(2.5)+ [f’]([2,3])( 
using xo = 2.5 and [f] (x) = 2a — 1. 


[2, 3] is [2, 6]. The natural 


) = Bal- = 23) = [1,71 
2,3] — 2.5) = [1.25, 6.25], 


Ze i 


Modal Intervals and Kaucher Arithmetic. The results introduced in this 
section are mostly based on the work of Goldsztejn et al. [15] on modal intervals. 
Let us first introduce generalized intervals, i.e., intervals whose bounds are not 
ordered, and the Kaucher arithmetic [18] on these intervals. 

The set of generalized intervals is denoted by IK = {æ = [z, 7], x € R,T € 
R}. Given two real numbers g and 7, with x < F, one can consider two general- 
ized intervals, |z, z], which is called proper, and [%, £], which is called improper. 
We define dual([a, b]) = [b,a] and pro ({a, b]) = [min(a, b), max(a, b)]. 


Definition 4 ([15]). Let f: R” — R be a continuous function and æ € IR”, 
which we can decompose in xa € IR? and we € (dual IR)? with p+q =m. A 
generalized interval z € IK is (f,x)-interpretable if 


(Va4 E€ x4) (Qz © pro z) (Are € prove) (f(x) = z) (2) 


where Q, =4 if (z) is proper, and Q, =V otherwise. 


When all intervals in (2) are proper, we retrieve the interpretation of classi- 
cal interval computation, which gives an outer-approximation of range(f,a), or 
(Va € x) (Sz € [z]) (f(x) = z). When all intervals are improper, (2) yields an 
inner-approximation of range(f,x), or (Vz € |pro z[) (Sx € pro x) (f(x) = z). 

Kaucher arithmetic [18] provides a computation on generalized intervals that 
returns intervals that are interpretable as inner-approximations in some simple 
cases. Kaucher addition extends addition on classical intervals by æ + y = [x + 
1 T+ 7] and æ — y = [x — y,T — y]. For multiplication, let us decompose IK in 

P = {x = [2,7], x >OAT 0}, -P = {z = |z, 7], x <S 0AF <0}, Z = {x = 
xz, z], x <0 < a ad dual Z = {x = |z, z], x > 0 > z}. When restricted to 
proper intervals, the Kaucher multiplication coincides with the classical interval 
multiplication. Kaucher multiplication zy extends the classical multiplication to 
all possible combinations of x and y belonging to these sets. We refer to [18] for 
more details. 

Kaucher arithmetic defines a generalized interval natural extension (see [15]): 


Proposition 1. Let f : R” — R be a function, given by an arithmetic 
expression where each variable appears syntactically only once (and with degree 
1). Then for x € IK”, f(x), computed using Kaucher arithmetic, is (f,x)- 
interpretable. 


In some cases, Kaucher arithmetic can thus be used to compute an inner- 
approximation of range(f,a). But the restriction to functions f with single 
occurrences of variables, that is with no dependency, prevents a wide use. A 
generalized interval mean-value extension allows us to overcome this limitation: 
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Theorem 1. Let f : R” — R be differentiable, and x € IK™ which we can 
decompose in x4 € IR? and xe € (dual IR)! with p+ q = m. Suppose that for 
each i € {1,...,m}, we can compute [A;] € IR such that 


o 
mi (x), £ € pro z} C [Aj]. (3) 

Then, for any č € pro æ, the following interval, evaluated with Kaucher 
arithmetic, is (f,a)-interpretable: 


n 


f(x) = f(@) + X [Ai](a; — ži). (4) 


i=l 


When using (4) for inner-approximation, we can only get the following sub- 
set of all possible cases in the Kaucher multiplication table: (x € P) x (y € 
dual Z) = [xy, xy], (æ € —P) x (y € dual Z) = [%y, Ty], and (a € Z) x (ye 
dual Z) = 0. Indeed, for an improper x, and % € pro æ, it holds that (æ — 7) 
is in dual Z. The outer-approximation [Aj] of the Jacobian is a proper interval, 
thus in P, —P or Z, and we can deduce from the multiplication rules that the 
inner-approximation is non empty only if [A;] does not contain 0. 


Example 3. Let f be defined by f(x) = x? — x, for which we want to compute 
an inner-approximation of the range over æ = [2,3]. Due to the two occurrences 
of x, f(duala), computed with Kaucher arithmetic, is not (f,a)-interpretable. 
The interval f(x) = f(2.5) + f’((2,3])(a@ — 2.5) = 3.75 + [3,5](@ — 2.5) given 
by its mean-value extension, computed with Kaucher arithmetic, is (f,x)- 
interpretable. For x = [2,3], using the multiplication rule for P x dual Z, we get 
f(x) = 3.75 + [8, 5]((2, 3] — 2.5) = 3.75 + [8, 5][0.5, —0.5] = 3.75 + [1.5, —1.5] = 
(5.25, 2.25], that can be interpreted as: (Vz € [2.25,5.25]) (Sa € [2,3]) (z = f(a)). 
Thus, [2.25, 5.25] is an inner-approximation of range( f, [2, 3]). 


In Sect. 4, we will use Theorem1 with f being each component (for a n- 
dimensional system) of the solution of the uncertain dynamical system (1): we 
need an outer enclosure of the solution of the system, and of its Jacobian with 
respect to the uncertain parameters. This is the objective of the next sections. 


3 Taylor Method for Outer Flowpipes of DDEs 


We now introduce a Taylor method to compute outer enclosures of the solution 
of system (1). The principle is to extend a Taylor method for the solution of 
ODEs to the case of DDEs, in a similar spirit to the existing work [28,32]. This 
can be done by building a Taylor model version of the method of steps [27], a 
technique for solving DDEs that reduces these to a sequence of ODEs. 
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3.1 The Method of Steps for Solving DDEs 


The principle of the method of steps is that on each time interval [to +77, to + (i+ 
1)r], for i > 1, the function z(t—T) is a known history function, already computed 
as the solution of the DDE on the previous time interval [to + (i — 1)r, to + iz]. 
Plugging the solution of the previous ODE into the DDE yields a new ODE on 
the next tile interval: we thus have an initial value problem for an ODE with 
z(to + ir) defined by the previous ODE. This process is initialized with zo(t) on 
the first time interval [to, to +7]. The solution of the DDE can thus be obtained 
by solving a sequence of IVPs for ODEs. Generally, there is a discontinuity in 
the first derivative of the solution at tọ + 7. If this is the case, then because of 
the term z(t — T) in the DDE, a discontinuity will also appear at each to + ir. 


Example 4. Consider the DDE defined in Example 1. On t € [0,7] the solution 
of the DDE is solution of the ODE 


i(t) = f(x(t),vo(t — 7,8)) = —2(4)(1 + B(t—7))’, t € [0,7] 


with initial value z(0) = x9(0, 8) = 1. It admits the analytical solution 


v(t) =exn (= 35 (A+ ¢-06)-0-9')), teod © 


The solution of the DDE on the time interval [7,27] is the solution of the ODE 


1 
i(t) = —a(t) exp (5 (a +(t-r-1))- (1 9) , t€ [r,2r] 
with initial value x(7) given by (5). An analytical solution can be computed, 
using the transcendantal lower y function. 


—ma 


3.2 Finite Representation of Functions as Taylor Models 


A sufficiently smooth function g (e.g. C°°), can be represented on a time interval 
[to, to + h] by a Taylor expansion 
k 


g(t) = $ (t — to)*g"(to) + (t — to)" Fg" * (6), (6) 


i=0 


with € € [to,to + h], and using the notation g(t) := ge) We will use such 
Taylor expansions to represent the solution z(t) of the DDE on each time interval 
[to + iT, to + (i + 1)r], starting with the initial condition zo(t, 3) on [to, to + 7]. 
For more accuracy, we actually define these expansions piecewise on a finer time 
grid of fixed time step h. The function z(t, 3) on time interval [to, to +7] is thus 
represented by p = T/h Taylor expansions. The lt” such Taylor expansion, valid 
on the time interval [to + lh, tp + (l+ 1)h] with | € {0,...,p—1}, is 
k 
2o(t, 8) = X (t — to)*2"l(to + Uh, 8) + (t — to) ZING, 8), (7) 
i=0 

for a & € [to + lh, to + (l+ 1)h]. 
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3.3 An Abstract Taylor Model Representation 


In a rigorous version of the expansion (7), the zl! (to+lh, 3) as well as gl*+4(&, 8) 
are set-valued, as the vector of parameters ( is set valued. The simplest way 
to account for these uncertainties is to use intervals. However, this approach 
suffers heavily from the wrapping effect, as these uncertainties accumulate with 
integration time. A more accurate alternative is to use a Taylor form in the 
parameters 3 for each zl’! (tọ + lh, 3). This is however very costly. We choose in 
this work to use a sub-polyhedric abstraction to parameterize Taylor coefficients, 
expressing some sensitivity of the model to the uncertain parameters: we rely on 
affine forms [9]. The result can be seen as Taylor models of arbitrary order in 
time, and order close to 1 in the parameters space. 

The vector of uncertain parameters or inputs 8 € , is thus defined as a vector 
of affine forms over m symbolic variables €; € [—1, 1]: 8 = ao +y 21 Qi£i, Where 
the coefficients a; are vectors of real numbers. This abstraction describes the set 
of values of the parameters as given within a zonotope. In the sequel, we will use 
for zonotopes the same bold letter notation as for intervals, that account for set 
valued quantities. 


Example 5. In Examplel, 8 = |, 1] can be represented by the centered form 
B = 2+ $e1. The set of initial conditions æo(t, 8) is abstracted as a func- 
tion of the noise symbol £1. For example, at t = —1, xo(—1, 6) = (1 — B)? = 
(1 — 3 — 31)? = §(1 — €1)?. The abstraction of affine arithmetic operators is 
computed componentwise on the noise symbols ¢;, and does not introduce any 
over-approximation. The abstraction of non affine operations is conservative: an 
affine approximation of the result is computed, and a new noise term is added, 
that accounts for the approximation error. Here, using £? € [0,1], affine arith- 
metic " will yield [ao](—1, 8) = ¿(1 — 2e1 + [0, 1]) = §(1.5 — 221 + 0.5e2), with 
€g € [—-1,1]. We are now using notation [£o], denoting an outer-approximation. 
Indeed, the abstraction is conservative: [a9](—1, Ø) takes its values in $[-1, 4], 
while the exact range of ao(—1, 3) for 6 € [},1] is §[0, 4]. 


Now, we can represent the initial solution for t € [to, to + T] of the DDE (1) 
as a Taylor model in time with zonotopic coefficients, by evaluating in affine 
arithmetic the coefficients of its Taylor model (7). Noting ro; = [to + jh,to + 
(j + 1)h], we write, for all j =0,...,p—1, 


[z](t) = X(t- to)! [zo]! + (t — to)" [Zo], t € roj (8) 


where the Taylor coefficients 


Jo = [zo] (to + jh, B) a 
I $ l! 


Zo, = [zo] (roj, 8) 


[zo 5 (9) 


can be computed by differentiating the initial solution with respect to t ({zo] 
denotes the l-th time derivative), and evaluating the result in affine arithmetic. 
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Example 6. Suppose we want to build a Taylor model of order k = 2 for the 
initial condition in Example1 on a grid of step size h = 1/3. Consider the 
Taylor model for the first step [to, to + A] = [—1,—2/3]: we need to evaluate 
[x00]! = [ao](—1, 8), which was done Example 5. 

We also need [zoo]! and [¥o0]7!. We compute [ao]! = [o](—1, B) = 26(1— 
B) and [Zoo]! = [ao]? (r1)/2 = [Ho] (41) /2 = 8’, with B = 2+4e1. We evaluate 
these coefficients with affine arithmetic, similarly to Example 5. 


3.4 Constructing Flowpipes 


The abstract Taylor models (8) introduced in Sect.3.3, define piecewise outer- 
approximating flowpipes of the solution on [to, t9 +7]. Using the method of steps, 
and plugging into (1) the solution computed on [tg +(i—1)r, to +27], the solution 
of (1) can be computed by solving the sequence of ODEs 


z(t) = f(z(t), z(t — T), 8), for t € [to + it, to + (i+ 1)r] (10) 


where the initial condition z(tp + iT), and z(t—7) for t in [to + iT, to + (i+ 1)7], 
are fully defined by (8) when i = 1, and by the solution of (10) at previous step 
when 7 is greater than 1. 

Let the set of the solutions of (10) at time ¢ and for the initial conditions 
z(t’) € z’ at some initial time t > to be denoted by 2(t,t’, z’). Using a Taylor 
method for ODEs, we can compute flowpipes that are guaranteed to contain the 
reachable set of the solutions z(t, to + 7, [z](to + 7)) of (10), for all times t in 
[to +7, to + 27], with [z](to +7) given by the evaluation of the Taylor model (8). 
This can be iterated for further steps of length 7, solving (10) for i = 1,..., T/T, 
with an initial condition given by the evaluation of the Taylor model for (10) at 
the previous step. 

We now detail the algorithm that results from this principle. Flowpipes are 
built using two levels of grids. At each step on the coarser grid with step size 
T, we define a new ODE. We build the Taylor models for the solution of this 
ODE on the finer grid of integration step size h = T/p. We note t; = to + i7 the 
points of the coarser grid, and tij = to + iT + jh the points of the finer grid. In 
order to compute the flowpipes in a piecewise manner on this grid, the Taylor 
method relies on Algorithm 1. All Taylor coefficients, as well as Taylor expansion 
evaluations, are computed in affine arithmetic. 


Step 1: Computing an a Priori Enclosure. We need an a priori enclosure 
[Z:;] of the solution z(t), valid on the time interval [¢;;, tig+1)]. This is done by a 
straightforward extension of the classical approach [26] for ODEs relying on the 
interval Picard-Lindeléf method, applied to Eq. (10) on [tij, ti(;41)] with initial 
condition [z;,]. If [f] is Lipschitz, the natural interval extension [F] of the Picard- 
Lindelöf operator defined by [F](z) = [243] +[tis, tags [FI (z, [Zig-5], B), where 
the enclosure of the solution over rj(j;-1) = [ti(j-1), tij] has already be computed 
as [Z;(j-1)], admits a unique fixpoint. A simple Jacobi-like iteration, zo = [zij], 
Z141 = F(z,) for all 1 € N, suffices to reach the fixpoint of this iteration which 
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Build by (9) the [zo,]", j € {0,...,p — 1} that define the Taylor model on 
lto, to +7], and Initialize next flowpipe: [z10] = [Zo] (tio, 8) at tio = to + T 
For alli = 0,..., T/T do 
For all 7 = 0,...,p—1 do 
Step 1: compute an a priori enclosure [Z;;] of z(t) valid on [ti;, tigj+1)] 
Step 2: build by (12), (14), a Taylor model valid on [tij, tigj+1)] 
Using (11), initialize next flowpipe: [24(;41)] = [2] (ticj 41), tij, [Z:3]) if 
j <p-1, [zG41)0] = [2] tG+0, tis, [zi]) if j =p- 1 
Algorithm 1. Sketch of the computation of outer reachable sets for a DDE 


yields [Z;;)], and ensures the existence and uniqueness of a solution to (10) on 
[tij, tic741)]. However, it may be necessary to reduce the step size. 


Step 2: Building the Taylor Model. A Taylor eka of order k of the 
solution at t;; which is valid on the time interval [t;j, tigj+1)], for i > 1, is 


k-1 
[z](t, tijs [z = [zig] + 50 (¢ — tig) [Fig]! + t- tes) Fil, (11) 
l1=1 
The Taylor coefficients are defined inductively, and can be computed by auto- 
matic differentiation, as follows: 


[fa] = f] (Leis) ze-n) (12) 
1] y 1 1 
beak a — E | Fig! + [zoz] ful! ) (13) 
ry 1 for"! a) Jaf” W) a. 
Ba = a E a + |S] [tens] | ia aa 


The Taylor coefficients for the remainder term are computed in a similar way, 
evaluating |f] over the a priori enclosure of the solution on rj; = [tij, tig+1)]- 
For instance, [Fa] = [Ff] ([Zigj], Zu-z]). The derivatives can be discontinuous 


at tio: the [ Ff iol!” coefficients correspond to the right-handed limit, at time ¢7). 
Let us detail the computation of the coefficients (12), (13) and (14). Let z(t) 
be the solution of (10). By definition, 4(t) = f(z(t), 2(t—7), 6) = fll (z(0), z(t- 
T), 8) from which we deduce the set valued version (12). We can prove (14) by 
induction on l. Let us denote 0z the partial derivative with respect to z(t), and 


Oz" with respect to the delayed function z(t — T). We have 


f+ (z(t), z(t- 7), 8) = way Se) = Hh (FEO, elt- 7), p) 
= hr (OR t- ie 


= rr ( f(2@), z(t- 7), | 
f(z(t —7), z(t — 27), B 


~] 


) 
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from which we deduce the set valued version (14). For t € [to + 7, to + 27], the 
only difference is that z(t — 7) is obtained by differentiating the initial solution 
of the DDE on [to, to + 7T], which yields (13). 


Example 7. As in Example 6, we build the first step of the Taylor model of order 
k = 2 on the system of Example 1. We consider t € [tg + 7, to + 27], on a grid of 
step size h = 1/3. Let us build the Taylor model on [to +7, to +7 + h] = [0,1/3]: 
we need to evaluate[ao], [f 19]!!! and [f 19]! in affine arithmetic. 

Following Algorithm 1, [£10] = [ao] (tio, 8) = [xo] (to +7, B) = [axo](0, B) = 1. 
Using (12) and the computation of [ago] of Example5, [fj]! = 
[F]([@10], [£00]) = [F](1, § (1-5 — 2e1 + 0.5e2)) = —§(1.5 — 221 + 0.5e2). Finally, 
using (13), Fio] = 0.5f(r10, Too), where Tio for i = 0, 1 (with Too = T10 — T) is 
the time interval of width h equal to [t;o, t] = [-1 +i, -1+i+1/3], and f(t,t— 
T) = a(t)a(t—7T)+a(t)a(t—7) = f(t,t—r)a(t—7)+ a(t)to(t—7) = —ax(t)a(t 
7)? +2a(t)5(1+6t). Thus, [Fio] P] = —0.5[x(r-10)] [@(ro0)]? + [#(710)] 61+ Brio). 
We need enclosures for £z(roo) and z(r10), to compute this expression. Enclosure 
[æ(roo)] is directly obtained as [æo] (roo) = (1 + Broo)?, evaluated in affine arith- 
metic. Evaluating [a(1r19)] requires to compute an a priori enclosure of the solu- 
tion on interval r10, following the approach described as Step 1 in Algorithm 1. 
The Picard-Lindeléf operator is [F](a) = [ao] + [0, $][f](x, [x(r00)],8) = 
1 + [0,3](1 + Groo)?x. We evaluate it in interval rather than affine arith- 
metic for simplicity: [F](x) = 1 + [0, 4] (1+ [4, 1J[-1, =)" x=1+ (0, Te. 
Starting with a = [aio] = 1, we compute a = [F](1) = [1,1 + =) 


zə = [F](#1) = [1,1 + T + (EI, etc. This is a geometric progression, that 
converges to a finite enclosure. 


Remark. A fixed step size yields a simpler algorithm. However it is possible to 
use a variable step size, with an additional interpolation of the Taylor models. 


4 Inner-Approximating Flowpipes 


We will now use Theorem 1 in order to compute inner-approximating flowpipes 
from outer-approximating flowpipes, extending the work [16] for ODEs to the 
case of DDEs. The main idea is to instantiate in this theorem the function f as 
the solution z(t, 8) of our uncertain system (1) for all t, and æ as the range G of 
the uncertain parameters. For this, we need to compute an outer-approximation 
of z(t, B) for some ĝ € 6, and of its Jacobian matrix with respect to 8 at any 
time t and over the range G. We follow the approach described in Sect. 3.4. 


Outer-Approximation of the Jacobian Matrix Coefficients. For the DDE 
(1) in arbitrary dimension n € N and with parameter dimension m € N, the 
Jacobian matrix of the solution z = (21,...,2n) of this system with respect to 
the parameters 3 = ((1,..-, Bm) is 


J 


(t) 
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for i between 1 and n, j between 1 and m. Differentiating (1), we obtain that 
the coefficients of the Jacobian matrix of the flow satisfy 


2? Of; ahi ofi 
DD ONO +58 Leage-n+ 5h a 
k= 


with initial condition J,;(t) = (Ji;)o(t, 8) = ilo (t, B) for t € [to, to + 7]. 


Example 8. The Jacobian matrix for Example 1 is a scalar since the DDE is 
real-valued and the parameter is scalar. We easily get J11 (t) = —a(t—7)Ji1(t) — 
xz(t)Jıı(t — T) with initial condition (J11)o(t, 8) = 2t(1 + Bt). 


Equation (15) is a DDE of the same form as (1). We can thus use the method 
introduced in Sect. 3.4, and use Taylor models to compute outer-approximating 
flowpipes for the coefficients of the Jacobian matrix. 


Computing Inner-Approximating Flowpipes. Similarly as for ODEs [16], 
the algorithm that computes inner-approximating flowpipes, first uses Algo- 
rithm 1 to compute outer-approximations, on each time interval [t;;, ti{j+1)], 
of 


1. the solution z(t, 3) of the system starting from the initialization function 
zo(t, B) defined by a given 3 € 8B 
2. the Jacobian J(t, 8) of the solution, for all 8 € 6 


Then, we can deduce inner-approximating flowpipes by using Theorem 1. Let 
as in Definition3 8 = (G4, Be) and note J, the matrix obtained by extracting 
the columns of the Jacobian corresponding to the partial derivatives with respect 
to Ba. Denote by Jg the remaining columns. If the quantity defined by Eq. (16) 
for t in [tij, tig+1)] is an improper interval 


l2La(t, tij Ba Be) = [2](t, tig, [Zi + [Jat tig (Jig) (Ba — Ba) 
+[J]e(t, tij, [Jij])(dual Be — Be) (16) 


then the interval (pro ]z[4(t, tij, 8.4, Bg)) is an inner-approximation of the reach- 
able set z(t,@) valid on the time interval [#;;,ti(j41)], which is robust with 
respect to the parameters 64, in the sense of Definition 3. Otherwise the inner- 
approximation is empty. If all parameters are existentially quantified, that is if 
the subset 64 is empty, we obtain the classical inner-approximation of Defini- 
tion 2. Note that a unique computation of the center solution [Z] and the Jacobian 
matrix [J] can be used to infer different interpretations as inner-approximations 
or robust inner-approximations. With this computation, the robust inner flow- 
pipes will always be included in the classical inner flowpipes. 

The computation of the inner-approximations fully relies on the outer- 
approximations at each time step. A consequence is that we can soundly imple- 
ment most of our approach using classical interval-based methods: outward 
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rounding should be used for the outer approximations of flows and Jacobians. 
Only the final computation by Kaucher arithmetic of improper intervals should 
be done with inward rounding in order to get a sound computation of the inner- 
approximation. 

Also, the wider the outer-approximation in Taylor models for the center and 
the Jacobian, the tighter and thus the less accurate is the inner-approximation. 
This can lead to an empty inner-approximation if the result of Eq. (16) in 
Kaucher arithmetic is not an improper interval. This can occur in two way. 
Firstly, the Kaucher multiplication [J]¢(dual Be — Be) in (16), yields a non- 
zero improper interval only if the Jacobian coefficients do not contain 0. Sec- 
ondly, suppose that the Kaucher multiplication yields an improper interval. It 
is added to the proper interval [z](t, tij, [2i;]) + [J]. * (BA — Ga). The center 
solution [z](t, t;;,[Z:;]) can be tightly estimated, but the term [J] 4(G4 — Ba) 
that measures robustness with respect to the G4 parameters can lead to a wide 
enclosure. If this sum is wider than the improper interval resulting from the 
Kaucher multiplication, then the resulting Kaucher sum will be proper and the 
inner-approximation empty. 


5 Implementation and Experiments 


We have implemented our method using the FILIB++ C++ library [23] for inter- 
val computations, the FADBAD++? package for automatic differentiation, and (a 
slightly modified version of) the aaflib® library for affine arithmetic. 

Let us first consider the running example, with order 2 Taylor models, and 
an integration step size of 0.05. Figure 1 left presents the results until t = 2 
(obtained in 0.03s) compared to the analytical solution (dashed lines): the solid 
external lines represent the outer-approximating flowpipe, the filled region rep- 
resents the inner-approximating flowpipe. Until time t = 0, the DDE is in its 
initialization phase, and the conservativeness of the outer-approximation is due 
to the abstraction in affine arithmetic of the set of initialization functions. Using 
higher-order Taylor models, or refining the time step improves the accuracy. 
However, for the inner-approximation, there is a specific difficulty: the Jacobian 
contains 0 at t = —1, so that the inner-approximation is reduced to a point. 
This case corresponds to the parameter value 3 = 1. To address this problem, 
we split the initial parameter set in two sub-intervals of equal width, compute 
independently the inner and outer flowpipes for these two parameters ranges, 
and then join the results to obtain Fig. 1 center. It is somehow counter intuitive 
that we can get this way a larger, thus better quality, inner-approximating set, 
as the inner-approximation corresponds to the property that there exist a value 
of 8 in the parameter set such that a point of the tube is definitely reached. Tak- 
ing a larger 8 parameter set would intuitively lead to a larger such inner tube. 
However, this is in particular due to the fact that we avoid here the zero in the 


? http: //www.fadbad.com. 
3 http: //aaflib.sourceforge.net. 
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Jacobian. More generally, such a subdivision yields a tighter outer-approximation 
of the Jacobian, and thus better accuracy when using the mean-value theorem. 


2 subdvisons ‘YO subdivisions 


‘ute approximation — widhfinner-anprox) | with (outr-approx) — 
aralyica solon | 
ime appronmaton 


A 45 0 as 1 15 2 4 a5 0 08 1 18 2 2 0 2 4 6 6 © 2 u 1 
{seen} 1 seconds] t leeconds} 


Fig. 1. Running example (Taylor model order 2, step size 0.05) 


In order to obtain an inner-approximation without holes, we can use a subdi- 
vision of the parameters with some covering. This is the case for instance using 10 
subdivisions, with 10% of covering. Results are now much tighter: Fig. 1 right rep- 
resents a measure y(x, t) of the quality of the approximations (computed in 45s) 
for a time horizon T = 15, with Taylor Model of order 3, a step size of 0.02. This 
accuracy measure y(x, t) is defined by y(x, t) = Hi where yu (£) and yo(£) mea- 
sure respectively the width of the inner-approximation and outer-approximation, 
for state variable «x. Intuitively, the larger the ratio (bounded by 1), 
the better the approximation. Here, y(x,t) almost stabilizes after some time, 
to a high accuracy of 0.975. We noted that in this example, the order of the Tay- 
lor model, the step size and the number of initial subdivisions all have a notable 
impact on the stabilized value of y, that can here be decreased arbitrarily. 


Example 9. Consider a basic PD-controller for a self-driving car, controlling the 
car’s position x and velocity v by adjusting its acceleration depending on the 
current distance to a reference position pp, chosen here as p, = 1. We consider a 
delay 7 to transfer the input data to the controller, due to sensing, computation 
or transmission times. This leads, for t > 0, to: 


e = v(t) 
v(t) = —K,(ax(t —T)- Pr) — Kav(t—7T) 


Choosing K, = 2 and K4 = 3 guarantees the asymptotic stability of the con- 
trolled system when there is no delay. The system is initialized to a constant 
function (x,v) € [—0.1,0.1] x [0,0.1] on the time interval [—r, 0]. 

This example demonstrates that even small delays can have a huge impact 
on the dynamics. We represent in the left subplot of Fig.2 the inner and outer 
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Fig. 2. Left and center: velocity and position of controlled car (left 7 = 0.35, center 
T = 0.2); Right: vehicles position in the platoon example 


approximating flowpipes for the velocity and position, with delay 7 = 0.35, until 
time T = 10. They are obtained in 0.32s, using Taylor models of order 3 and a 
time step of 0.03. The parameters were chosen such that the inner-approximation 
always remains non-empty. We now study the robustness of the behavior of the 
system to the parameters: Kp and Kg are time invariant, but now uncertain 
and known to be bounded by (Kp, Ka) € [1.95, 2.05] x [2.95, 3.05]. The Jacobian 
matrix is now of dimension 2 x 4. We choose a delay 7 = 0.2, sufficiently small 
to not induce oscillations. Thanks to the outer-approximation, we prove that 
the velocity never becomes negative, in contrast to the case of r = 0.35 where 
it is proved to oscillate. In Fig.2 center, we represent, along with the over- 
approximation, the inner-approximation and a robust inner-approximation. The 
inner-approximation, in the sense of Definition 2, contains only states for which 
it is proved that there exists an initialization of the state variables x and v in 
[—0.1, 0.1] x [0,0.1] and a value of K, and Ka in [1.95, 2.05] x [2.95, 3.05], such 
that these states are solutions of the DDE. The inner-approximation which is 
robust with respect to the uncertainty in Kp and Kg, in the sense of Defini- 
tion 3, contains only states for which it is proved that, whatever the values of 
K, and Ka in [1.95, 2.05] x [2.95, 3.05], there exist an initialization of x and v in 
{[—0.1, 0.1] x [0, 0.1], such that these states are solutions of the DDE. These results 
are obtained in 0.24s, with order 3 Taylor models and a time step of 0.04. The 
robust inner-approximation is naturally included in the inner-approximation. 


We now demonstrate the efficiency of our approach and its good scaling 
behavior with respect to the dimension of the state space, by comparing our 
results with the results of [30] on their seven-dimensional Example 3: 


Example 10. Let «(t) = f(a(t),a(t — r)), t € [r = 0.01,T], where f(a(t), 
a(t — Tr) = (14as(t) — 0.9a1(t — 1),2.5¢5(t) — 1.529(t),0.6a7(t) — 
0.843 (t)x2(t), 2—1.344(t)a3(t), 0.721 (t)—x4(t)a5(t), 0.30 (t) 3.146 (t), 1.826 (t)— 
1.5a7(t)vo(t)), and the initial function is constant on [—7,0] with values in 
a box‘ [1.0, 1.2] x (0.95, 1.15] x [1.4, 1.6] x [2.3,2.5] x [0.9,1.1] x [0.0,0.2] x 
(0.35,0.55]. We compute outer and inner approximations of the reachable 
sets of the DDE until time t = 0.1, and compare the quality measure 


4 The first component is different from that given in [30], but is the correct initial 
condition, after discussion with the authors. 
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9y(a1),---,y(@7) for the projection of the approximations over each variable 
x, to z7, of our method with respect to [30]. We obtain for our work 
the measures 0.998, 0.996, 0.978, 0.964, 0.97, 0.9997, 0.961, to be compared to 
0.575, 0.525, 0.527, 0.543, 0.477, 0.366, 0.523 for [30]. The results, computed with 
order 2 Taylor models, are obtained in 0.13s with our method, and 505s with 
[30]. Our implementation is thus both much faster and much more accurate. 
However, this comparison should only be taken as a rough indication, as it is 
unfair to [30] to compare their inner boxes to our projections on each component. 


Example 11. Consider now the model, adapted from [11], of a platoon of n 
autonomous vehicles. Vehicle Cj11 is just after C;, for i = 1 to n — 1. Vehi- 
cle C4 is the leading vehicle. Sensors of Ci+ı measure its current speed vi+ı as 
well as the speed v; of the vehicle just in front of it. There respective positions 
are x;+, and x;. We take a simple model where each vehicle Ci, accelerates so 
that to catch up with C; if it measures that v; > vi+ı and acts on its brakes 
if v; < Uj41. Because of communication, accelerations are delayed by some time 
constant T: 


tilt) = v(t) i=2, n 
Usil(t) = alvit — 7) — viga(t—7))i=2,---,n-1 


We add an equation defining the way the leading car drives. We suppose it adapts 
its speed between 1 and 3, following a polynomial curve. This needs to adapt 
the acceleration of vehicle C2: 


41 (t) = 2 + (wi(t)/5 — 1)(ai(t)/5 — 2)(ai(t)/5 — 3)/6 
ba(t) = a(2 + (#1(t)/5 — 1)(ai(t)/5 — 2)(a1(t)/5 — 3)/6 — v2(t — 7)) 


We choose T = 0.3 and a = 2.5. The initial position before time 0 of car C; is 
slightly uncertain, taken to —(i — 1) + [—0.2,0.2], and its speed is in [1.99,2.01]. 
We represent in the right subplot of Fig.2 the inner and outer approximations 
of the position of the vehicles in a 5 vehicles platoon (9-dimensional system) 
until time T = 10, with a time step of 0.1, and order 3 Taylor models, computed 
in 2.13s. As the inner-approximations of different vehicles intersect, there are 
some unsafe initial conditions, such that the vehicules will collide. This example 
allows us to demonstrate the good scaling of our method: for 10 vehicles (19-dim 
system) and with the same parameters, results are obtained in 6.5s. 


6 Conclusion 


We have shown how to compute, efficiently and accurately, outer and inner flow- 
pipes for DDEs with constant delay, using Taylor models combined with an 
efficient space abstraction. We have also introduced a notion of robust inner- 
approximation, that can be computed by the same method. We would like to 
extend this work for fully general DDEs, including variable delay, as well as study 
further the use of such computations for property verification on networked con- 
trol systems. Indeed, while testing is a weaker alternative to inner-approximation 
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for property falsification, we believe that robust inner-approximation provides 
new tools towards robust property verification or control synthesis. 
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